Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,485 datasets
Kyle Peyton released this social science dataset in 2026 to examine the impact of land acknowledgments on non-Indigenous populations. It contains replication data and statistical code derived from experimental studies conducted in Australia and the United States.
A subset of the DeepScaleR dataset, specifically the 'easy' part where problems have a pass rate greater than 4 out of 8. The dataset was created by Taiqiang Wu and colleagues, with associated research published in 2026. It is hosted on Hugging Face and was last updated on 2026-02 25.
MedProofX-Wheelhouse is a dataset published on Kaggle. The title suggests it likely contains information related to medical evidence or clinical outcomes. Its specific content, size, and origin require verification after download due to minimal provided metadata.
A dataset focused on the detection of synthetic media. The data likely contains features derived from the CLIP model and statistical analysis for identifying deepfakes. It is hosted on Kaggle, but details on its creator, size, and specific composition are not provided in the metadata.
Stata .do files used for analysis of hypertension prevalence. The code was authored by Stephen Alajajian for the Centro de Investigacion de la Salud Indigena Dataverse. The dataset was last updated on March 18, 2026.
This replication package provides the code and tutorial materials for visualizing ion conduction pathways using graph-theoretic methods, authored by Maria Gomez and colleagues. Updated in March 2026, the collection supports the methodologies presented in the MRS Bulletin publication 'Graph-Theoretic Visualization of Ion Conduction Pathways: Concepts and a Tutorial.'
A dataset of chair designs intended for multi-objective optimization tasks. It was published on Kaggle, but specific details on its size, creation date, and authorship are not provided in the metadata. The actual data content and structure require verification after download.
MedProofSat contains satellite images collected between 2010 and 2017, annotated with land-use classes. The dataset supports remote sensing tasks such as land cover classification, urban change detection, environmental monitoring.
50 annotated sentences form a public subset of the ParsProof benchmark for evaluating Persian proofreading systems. The dataset features fine-grained annotations across 52 error types tailored to the linguistic properties of Persian. It was created by sbunlp and last updated on Hugging Face in February 2026.
bkmr is an implementation of a Bayesian kernel machine regression method for estimating the joint health effects of multiple concurrent exposures, as described by Jennifer F. Bobb et al. in 2015. The dataset likely contains statistical model outputs or simulation data used to validate the methodology. It is sourced from the paperswithcode platform, which aggregates research code and related resources.
NHS Digital's statistical report presents a range of information on smoking in England, drawn from multiple government sources. The report covers topics such as smoking prevalence, habits, behaviors, attitudes, related ill health, mortality, and associated costs. It combines data from the Health and Social Care Information Centre, Department of Health, Office for National Statistics, and Her Majestyβs Revenue and Customs.
An informal history of operations research, structured into eight chronological parts. The timeline covers precursors from 1564 to 1935, the field's birth and expansion from 1936 to 1950, and subsequent developments in methods, algorithms, and applications up to 2004. It includes sections for acronyms, a name index, and a subject index.
Multi-armed bandit solver for the AIMO 3 competition math problems. The dataset likely contains algorithmic solutions or performance data for a specific mathematical competition. Its origin and scale are unspecified in the provided metadata.
MedProofX Offline Models is a dataset published on Kaggle. The title suggests it contains artifacts related to machine learning models for medical applications. The dataset's specific content, size, and structure require verification after download.
Version 2.1 provides a mathematical interpretation of superficial deposit thickness across England, Scotland, and Wales. The model, created by the British Geological Survey, interpolates borehole and Digmap data, assigning a minimum 1.5-meter thickness in known deposit areas lacking bore data.
1,500 instruction-tuned mathematical question-and-answer pairs for training AI. The dataset appears to be focused on mathematical problems in the Hindi language. The author, organization, and specific creation details are not provided.
John M. Barry authored a tutorial on Bayesian data analysis. The dataset likely contains example data and code for implementing Bayesian methods. It is published on paperswithcode.
Lettuce Data as Scope of Work using SMART Method is a dataset published on Kaggle. The raw description indicates it provides a statistical profile of lettuce growth, likely containing measurements related to plant development. The specific variables, collection methodology, and time period are not detailed in the available metadata.
MedProofX Offline Data is a dataset hosted on Kaggle. The dataset's title suggests it contains information related to medical verification or proofing processes. The specific content, size, and origin require verification after download.
Statistical insights into the global fragrance marketplace, sourced from eBay e-commerce data. The dataset likely contains information on perfume sales for men and women. The author, organization, and temporal coverage are unknown.