Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,487 datasets
Boundaries established from 1980 through 2010, with modifications following the 1990, 2000, and 2010 Censuses. The dataset defines Neighborhood Statistical Areas, which are Census Tract-based units created by the New Orleans City Planning Commission and maintained by the Greater New Orleans Community Data Center for local data analysis. The data was last updated on March 17, 2025.
Big-Math contains over 250,000 rigorously filtered and verified mathematical problems released by SynthLabsAI in early 2025. Designed for reinforcement learning (RL) in language models, the collection focuses on high-quality reasoning tasks and open-ended questions. It is distributed in Parquet format under the Apache 2.0 license.
A composite index tracks the statistical reliability of U.S. Gross Domestic Product (GDP) estimates. The Bureau of Economic Analysis calculates this measure from six reliability indicators using three-year rolling averages. The data is maintained by performance.commerce.gov and was last updated in March 2025.
Working Data includes raw measurements from a Radio Access Network using commercial hardware, collected at multiple layers of the LTE protocol stack. Derived Data provides statistical analyses of the raw measurements, processed via Python 3 scripts. The dataset is divided into three distinct tranches corresponding to the project's experimental design.
Statistical reports from the Main Directorate of the State Tax Service in the Lviv region of Ukraine. The data covers the receipt and handling of citizens' appeals and requests for public information, as well as reporting on the implementation of the directorate's work plan. The dataset was last updated on April 3, 2025.
UniMER is a dataset for advancing Mathematical Expression Recognition (MER), containing over one million training instances of diverse mathematical expressions. It was created by author 'wanderkid' and includes a separate test set for evaluation. The dataset was last updated on the Hugging Face platform in March 2025.
ProofNet# is a port of the ProofNet benchmark for autoformalization and formal proving of undergraduate-level mathematics. The dataset contains 371 examples, each consisting of a formal theorem. It was created by PAug and last updated on March 24, 2025.
OpenR1-Math-220k contains 220,000 mathematical problems paired with multiple reasoning traces generated by DeepSeek R1, released by the open-r1 project in February 2025. Each problem includes two to four distinct reasoning paths derived from NuminaMath 1.5, with correctness verified through automated tools or LLM-based judging.
7,473 training and 1,319 testing samples form this Chinese mathematical reasoning dataset. The question-answer pairs were translated from the original English GSM8K dataset using GPT-3.5-Turbo with few-shot prompting techniques. The dataset was processed using MetaMath and is hosted by PaddlePaddle.
AIME_1983_2024 contains mathematical competition problems and solutions from the American Invitational Mathematics Examination spanning 1983 to 2024. Compiled by Di Zhang from the Art of Problem Solving Wiki, the collection includes fewer than 1,000 records. It is designed specifically as a benchmark for evaluating mathematical reasoning in large language models.
Quarterly self-reported data from 2013 to the present on demographics and bed availability in New York State's Adult Care Facilities. The dataset, provided by health.data.ny.gov, includes columns for admissions, discharges, census by age and gender, and certified capacity for various facility types. Information is submitted by facility operators under state regulations and is not audited by the Department.
Over 1,300 unique mathematical puzzles for the 'Game of 24' sourced from 4nums.com by nlile. It features difficulty metrics derived from more than 6.4 million human solution attempts recorded between 2012 and 2025.
Eurus-2-RL-Data is a high-quality reinforcement learning training dataset for mathematics and coding problems. It includes outcome verifiers, such as LaTeX answers for math and test cases for coding. The dataset was created by PRIME-RL and was last updated on 2025-02-19.
Ultralytics medical-pills detection dataset is a proof-of-concept collection of labeled images for training computer vision models to identify medical pills. The dataset was created by Ultralytics and was last updated on February 10, 2025. It is designed to demonstrate the potential of AI in pharmaceutical applications.
Arithmetic Word Problem Compendium Dataset (AWPCD) is a collection of mathematical word problems spanning multiple domains with natural language variations. The dataset, created by HelloCephalopod, contains a sample of 1,000 problems, each requiring 1 to 5 steps of mathematical operations. It was last updated on the Hugging Face platform on February 15, 2025.
The MiniF2F dataset contains mathematical problems from sources like AMC competitions paired with their formal statements in the Lean theorem prover format. It was created by Tonic and last updated on 2025-02-05. Each example includes both informal mathematical statements and their corresponding formal representations.
GSM8K_zh_tw is a dataset for mathematical reasoning in Traditional Chinese, derived from the GSM8K_zh dataset. It contains 7,473 training samples and 1,319 testing samples, translated and regionally adapted for Traditional Chinese users. The dataset was created by DoggiAI and last updated on January 30, 2025.
mathLab developed Smithers, a mathematical interdisciplinary toolbox for engineers and scientists, with the most recent update in March 2025. It provides Python-based utilities for signal processing, VTK file handling, and machine learning integration.
Ukrainian data from the eu_open_data platform contains statistical reporting information for the institution 'DNZ No 72 BMR'. The dataset includes the institution's number from the ISUO system and its full and abbreviated name. It was last updated on 2025-01-09.
ARPA-E Grid Optimization Competition Challenge 2 materials include a 97-page problem formulation document with 299 equations. The challenge, running from 2020 to 2021, expanded on prior problems by incorporating adjustable transformer tap ratios, phase shifting transformers, and price-responsive demand. It involved multiple university and laboratory teams funded through prior prize awards.