Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,469 datasets
Lucy R. Williams authored a summary table comparing the strengths and limitations of four statistical approaches, likely related to patient-reported outcomes. The dataset is a 9.5 KB Excel file, last updated on March 18, 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
The Human-in-the-Loop Interpretability Prior dataset is associated with research by Isaac Lage of Harvard University. It was used to develop an algorithm that minimizes user studies to find models that are both predictive and interpretable. The research demonstrates the approach on several datasets, showing trends towards different proxy notions of interpretability across tasks.
Aki Vehtari developed an efficient approximate leave-one-out cross-validation method for Bayesian models. The method uses Pareto smoothed importance sampling (PSIS) to regularize importance weights and is described in a 2017 paper. The package also provides methods for model averaging via stacking and other weighting techniques.
Gary King of Harvard University Press presents a methodological approach for improving the interpretation and presentation of statistical results. The work suggests using statistical simulation to extract overlooked information and present it in a reader-friendly manner. The goal is to convey precise estimates of substantively interesting quantities with clear measures of uncertainty.
Water temperature and salinity measurements collected every ten minutes at a fixed location 300 meters offshore from an institute in Onjuku, Chiba Prefecture. The data is processed with statistical analysis and graphical presentation, and is available as raw ten-minute readings as well as hourly and daily mean values. The dataset is provided by the organization SCIOPS via the NASA Earthdata platform.
Historical shoreline positions for the Northern Ireland coastline, derived from Ordnance Survey maps and aerial imagery. Ulster University produced this digital asset by analyzing shoreline geometry over annual to decadal periods since the early 1800s. The data includes rate-of-change statistics calculated at 25-meter intervals using the Digital Shoreline Analysis System (DSAS).
Evaluation metrics for six statistical models predicting rice grain width. The 5.5 KB XLS file, authored by Rupam Basu and last updated in March 2026, compares Spike-and-Slab, Bayesian LASSO, BSLMM, Ridge, LASSO, and OLS models. Performance is assessed using RMSE, MAE, and Predictive Coverage from a five-fold cross-validation.
Rupam Basu published this dataset on 2026-03-17. It contains evaluation metrics for six statistical and machine learning models predicting seedling height (SDHT) in rice. The 5.5 KB XLS file compares RMSE, MAE, and predictive coverage from a five-fold cross-validation.
Rupam Basu's dataset compares the performance of six statistical models for predicting grain length (GRLT) in rice. The evaluation uses five-fold cross-validation and reports metrics including RMSE, MAE, and Predictive Coverage. The dataset was last updated on March 17, 2026, and is shared under a CC-BY-4.0 license.
Statistical data details sales of fluid milk, measured in kilolitres, categorized by milk type in Ontario. The dataset is provided by the Government of Ontario and was last updated in March 2026.
Comprising key techno-economic metrics for modeling distributed solar adoption, derived from the NREL dGen model. It represents each county and sector in the continental United States as a single weighted-average agent based on 2018 simulation assumptions.
A statistical framework for scoring and monitoring scientific datasets, likely applied to air quality data. The dataset's origin is associated with the UCI Machine Learning Repository, a common source for benchmark datasets. Specific details on volume, collection period, and authorship are not provided in the available metadata.
Geoscience Australia provides theoretical representations of isostatic processes as mathematical filters, or admittance functions. The data details the application of Green's equivalent layer theorem to model compensation via elastic and visco-elastic lithospheric rheologies. This approach offers computational efficiency for calculating free-air gravity anomalies from topography compared to conventional methods.
15,000 samples span 104 reinforcement learning environments for tasks in algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and common games. Generated procedurally using the Reasoning Gym library by NVIDIA, this dataset is designed for commercial use to improve reasoning capabilities.
87 children aged 8 to 10 years participated in a study comparing motor performance between those enrolled in rhythmic gymnastics, handball, or indoor soccer and those attending only physical education classes. The data were analyzed using inferential statistics, including the Kruskal-Wallis and Mann-Whitney U tests. The study, authored by Patrik Felipe NazΓ‘rio, concluded that the sport context influences the level of motor performance and specific motor skills.
WhatIf implements methods from Gary King and Langche Zeng's 2006 and 2007 papers on the dangers of extreme counterfactuals. The software offers easy-to-apply tests to evaluate counterfactuals without requiring sensitivity testing over specified model classes. It was authored by Heather Stoll, Gary King, Langche Zeng, Christopher Gandrud, and Ben Sabath.
Aggregating statistical comparisons of Taxus baccata (yew) populations, including a Tukey HSD pairwise multiple comparison test. It is a small, 5.5 KB Excel file authored by Eleftheria Dalmaris and last updated in March 2026.
A collection of multiple reaction monitoring (MRM) and quantification parameters for five taxane compounds, used for mass spectrometry optimization. It was created by Eleftheria Dalmaris and is available in an XLS file format under a CC BY 4.0 license.
Encompassing experimental measurement and statistical analysis data in an XLS format, authored by Qiliang Liu. The dataset is 44.0 KB in size and was last updated in March 2026.
This synthetic text dataset contains between 1,000 and 10,000 reasoning traces generated by Roman1111111 in March 2026. It captures the internal Chain of Thought and logical deduction patterns of the Claude Opus 4.6 model, specifically targeting mathematical accuracy.