Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,487 datasets
Digital vector boundaries for Middle layer Super Output Areas in England and Wales as at 21 March 2021. The dataset is joined to the Rural Urban Classification data 2021, a product developed by the Office for National Statistics, Department for Environment, Food and Rural Affairs, and Welsh Assembly Government. Source data is licensed under the Open Government Licence v.3.0.
Digital vector boundaries for Lower layer Super Output Areas in England and Wales, as at 21 March 2021. The data is joined to the Rural-Urban Classification 2021, a product developed by the Office for National Statistics, Defra, and the Welsh Assembly Government. Source data is licensed under the Open Government Licence v.3.0 and contains OS data ยฉ Crown copyright 2025.
England and Wales digital vector boundaries for Local Authority Districts as of December 2021. The dataset is joined to the 2021 Rural-Urban Classification, a product developed by the Office for National Statistics, the Department for Environment, Food and Rural Affairs, and the Welsh Assembly Government. Source data is from the Office for National Statistics and contains OS data ยฉ Crown copyright 2025.
England and Wales digital vector boundaries for Output Areas as of December 2021. The dataset is joined to the 2021 Rural-Urban Classification, a product developed by the Office for National Statistics, the Department for Environment, Food and Rural Affairs, and the Welsh Assembly Government. Source data is from the Office for National Statistics licensed under the Open Government Licence v.3.0.
Digital vector boundaries for Middle layer Super Output Areas (MSOAs) in England and Wales as of March 2021. This dataset is joined to the 2021 Rural-Urban Classification, a Government Statistical Service product developed by the Office for National Statistics, Defra, and the Welsh Assembly Government. Source data is from the Office for National Statistics and contains OS data ยฉ Crown copyright 2025.
Lower layer Super Output Areas (December 2021) Boundaries EW BFE (V10) and Rural Urban Classification data 2021 contains digital vector boundaries for Lower layer Super Output Areas in England and Wales, as of 21 March 2021. The data is joined to the 2021 Rural-Urban Classification, a Government Statistical Service product developed by the Office for National Statistics, Defra, and the Welsh Assembly Government. The source is the Office for National Statistics, licensed under the Open Government Licence v.3.0.
Office for National Statistics provides digital vector boundaries for Output Areas in England and Wales as of December 2021. The boundaries are joined to the 2021 Rural-Urban Classification data, a product developed by the Office for National Statistics, Department for Environment, Food and Rural Affairs, and the Welsh Assembly Government. The data is licensed under the Open Government Licence v.3.0 and contains OS data ยฉ Crown copyright 2025.
mlx-community provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. It contains 1,000 total examples, split into 800 for training, 100 for validation, and 100 for testing. The dataset was last updated on May 27, 2025.
Codeforces, a popular competitive programming platform, provides a collection of over 10,000 unique algorithmic problems. The dataset, created by open-r1, includes problems from the earliest contests up to 2025, designed to test code reasoning capabilities.
Millions of real user code submissions from the Codeforces competitive programming platform. The dataset contains human solutions to challenging algorithmic optimization problems, curated by open-r1 and last updated in May 2025.
A benchmark dataset from the research paper 'Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving'. It was created by author 'purewhite42' and last updated on May 8, 2025. The research focuses on formulating problem-solving as a deterministic Markov decision process within formal theorem proving environments.
PutnamBench-Solving is a benchmark for evaluating formal problem-solving within theorem proving environments. The dataset is part of the official implementation for research on process-verified problem-solving beyond proving known targets. It was created by author purewhite42 and last updated on May 8, 2025.
Herald Proofs is a dataset of 45,000 natural language to formal logic (NL-FL) proofs, constituting the proof part of the larger Herald dataset. The dataset was created by authors including Guoxiong Gao and Yutong Wang and presented at the International Conference on Learning Representations in 2025. It is associated with the Lean 4 theorem prover, specifically version v4.11.0.
DeepSeek-ProverBench is a dataset for training and evaluating large language models on formal theorem proving in the Lean 4 environment. It was created by deepseek-ai using a recursive theorem proving pipeline powered by the DeepSeek-V3 model to decompose complex problems into subgoals. The dataset was last updated on April 30,ๆไปฌๅ็ฐไบไธไธช้ฎ้ข๏ผ่พๅ ฅไธญ็ๆ่ฟฐๆฏไธญๆ็๏ผไฝ่พๅบ่ฆๆฑๆฏ่ฑๆใๆ นๆฎๆไปค๏ผๆ้่ฆๅฐ่พๅ ฅ็ฟป่ฏๆ่ฑๆใ่ฎฉๆ้ๆฐๅค็ใ2025.
A statistical classification of 2021 Output Areas (OA) in England and Wales as rural or urban, produced by the Office for National Statistics (ONS). The classification is based on address density, physical settlement form, population size, and relative access to major towns and cities. It provides a consistent view of geography for higher-level areas like LSOA, MSOA, and LAD.
Vermont's submissions to the Public Library Survey provide annual statistics on library operations and services from 2018 to 2023. The data is collected by the Institute of Museum and Library Services (IMLS) and intended to be filled out by every public library in the country each year.
239 challenging problems across four scientific domains designed to evaluate LLM-based scientific equation discovery methods. The benchmark was created by author nnheui and last updated on 2025-04-20. It includes a category, LSR-Transform, which transforms common physical models into less common mathematical representations to test reasoning beyond memorization.
Phase 2 of a curriculum learning pipeline contains moderately difficult math problems requiring multiple reasoning steps in Turkish. The dataset was created by author 'erayalp' and was last updated in April 2025. It is designed to bridge the gap between basic arithmetic and complex problem-solving for language models.
2011 to present. Age-adjusted prevalence data from the Behavioral Risk Factor Surveillance System (BRFSS) for selected U.S. metropolitan statistical areas (MMSAs) with 500 or more respondents. The data is collected by the CDC through a continuous, state-based surveillance system tracking modifiable risk factors for chronic diseases. It is updated annually and hosted on data.cdc.gov.
2011 to present. The dataset contains prevalence data from the Behavioral Risk Factor Surveillance System (BRFSS) for selected Metropolitan Statistical Areas (MMSAs) with 500 or more respondents. It is produced by the Centers for Disease Control and Prevention (CDC) and is updated annually to track modifiable risk factors for chronic diseases and leading causes of death.