DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,487 datasets

New Orleans Neighborhood Statistical Area Boundaries

Boundaries established from 1980 through 2010, with modifications following the 1990, 2000, and 2010 Censuses. The dataset defines Neighborhood Statistical Areas, which are Census Tract-based units created by the New Orleans City Planning Commission and maintained by the Greater New Orleans Community Data Center for local data analysis. The data was last updated on March 17, 2025.

TabularGeospatialCSVXMLJSONGeographic BoundariesCensus DataUrban PlanningNew Orleans+1

0 views

Mathematics & Statistics

Big-Math: 250,000 Verified Problems for Reinforcement Learning

Big-Math contains over 250,000 rigorously filtered and verified mathematical problems released by SynthLabsAI in early 2025. Designed for reinforcement learning (RL) in language models, the collection focuses on high-quality reasoning tasks and open-ended questions. It is distributed in Parquet format under the Apache 2.0 license.

ParquetTask Categoriestext GenerationLibrarypolarsTask Categoriesquestion AnsweringLanguageenModalitytextSize Categories100 Kn1 MMathematicsArxiv250217387LibrarymlcroissantLibrarydatasetsLibrarypandasRegionusReasoningOpen Ended QuestionsReinforcement LearningVerifiableMathLicenseapache 20+1

0 views

Mathematics & Statistics

U.S. GDP Reliability Index from Bureau of Economic Analysis

A composite index tracks the statistical reliability of U.S. Gross Domestic Product (GDP) estimates. The Bureau of Economic Analysis calculates this measure from six reliability indicators using three-year rolling averages. The data is maintained by performance.commerce.gov and was last updated in March 2025.

TabularTime SeriesCSVXMLJSONGdp ReliabilityComposite IndexGeneral Government ManagementEconomic DevelopmentEquityEconomic Statistics+1

0 views

Mathematics & Statistics

Radio Access Network Anomalous and Baseline State Measurements

Working Data includes raw measurements from a Radio Access Network using commercial hardware, collected at multiple layers of the LTE protocol stack. Derived Data provides statistical analyses of the raw measurements, processed via Python 3 scripts. The dataset is divided into three distinct tranches corresponding to the project's experimental design.

MetrologyMulti Feature DetectionAnomalous State DetectionMeasurement StabilityRadio Access NetworkMulti Layer ArchitectureCellular Communication Security+1

0 views

Mathematics & Statistics

Lviv Region Tax Service Reports on Citizen Appeals and Information Requests

Statistical reports from the Main Directorate of the State Tax Service in the Lviv region of Ukraine. The data covers the receipt and handling of citizens' appeals and requests for public information, as well as reporting on the implementation of the directorate's work plan. The dataset was last updated on April 3, 2025.

TextCitizen AppealsUkraine LvivTax ServicePublic InformationGovernment Reports+1

0 views

Mathematics & Statistics

Mathematical Expression Recognition Dataset with Over One Million Training Instances

UniMER is a dataset for advancing Mathematical Expression Recognition (MER), containing over one million training instances of diverse mathematical expressions. It was created by author 'wanderkid' and includes a separate test set for evaluation. The dataset was last updated on the Hugging Face platform in March 2025.

ImageMultimodalImage To TextLanguagezhTask Categoriesimage To TextSize Categories1 Mn10 MLanguageenArxiv240903643ModalityimageOptical Character RecognitionMath AiRegionusLarge ScaleArxiv240415254MathLicenseapache 20Mer+1

0 views

Mathematics & Statistics

ProofNet#: A Lean 4 Benchmark for Undergraduate Mathematics Formalization

ProofNet# is a port of the ProofNet benchmark for autoformalization and formal proving of undergraduate-level mathematics. The dataset contains 371 examples, each consisting of a formal theorem. It was created by PAug and last updated on March 24, 2025.

TextMathematicsAutoformalizationBenchmarkLean Theorem ProverFormal Proofs+1

0 views

Mathematics & Statistics

OpenR1-Math-220k: 220,000 Math Problems with DeepSeek R1 Reasoning Traces

OpenR1-Math-220k contains 220,000 mathematical problems paired with multiple reasoning traces generated by DeepSeek R1, released by the open-r1 project in February 2025. Each problem includes two to four distinct reasoning paths derived from NuminaMath 1.5, with correctness verified through automated tools or LLM-based judging.

ParquetLibrarypolarsLibrarydaskLanguageenModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionusLicenseapache 20+1

0 views

Mathematics & Statistics

GSM8K Distilled Zh: Chinese Math Word Problems for Supervised Learning

7,473 training and 1,319 testing samples form this Chinese mathematical reasoning dataset. The question-answer pairs were translated from the original English GSM8K dataset using GPT-3.5-Turbo with few-shot prompting techniques. The dataset was processed using MetaMath and is hosted by PaddlePaddle.

TextMathematical ReasoningQuestion AnsweringDistillationEducationChinese Language+1

0 views

Mathematics & Statistics

AIME_1983_2024: 41 Years of American Invitational Mathematics Examination Problems

AIME_1983_2024 contains mathematical competition problems and solutions from the American Invitational Mathematics Examination spanning 1983 to 2024. Compiled by Di Zhang from the Art of Problem Solving Wiki, the collection includes fewer than 1,000 records. It is designed specifically as a benchmark for evaluating mathematical reasoning in large language models.

CSVLibrarypolarsSize Categoriesn1 KModalitytextModalitytabularLibrarymlcroissantDoi1057967hf4687Arxiv240607394LibrarydatasetsLibrarypandasRegionusLicensemitArxiv241002884+1

0 views

Mathematics & Statistics

Adult Care Facility Quarterly Census and Capacity Reports for New York State, 2013-Present

Quarterly self-reported data from 2013 to the present on demographics and bed availability in New York State's Adult Care Facilities. The dataset, provided by health.data.ny.gov, includes columns for admissions, discharges, census by age and gender, and certified capacity for various facility types. Information is submitted by facility operators under state regulations and is not audited by the Department.

TabularTime SeriesCSVXMLJSONQsirFacilities And ServicesAdult Care FacilitiesNew York StateEnhanced Assisted Living ResidenceQuarterly StatisticsBedsAssisted Living ResidenceAdult HomeEnriched Housing ProgramBed CapacityCapacityAdult Care FacilitySpecial Needs Assisted Living ResidenceHealthcare CensusAssisted Living ProgramQuarterly Statistical Information ReportCensus+1

0 views

Mathematics & Statistics

24 Game: 1,300 Math Puzzles with 6.4 Million Human Attempt Metrics

Over 1,300 unique mathematical puzzles for the 'Game of 24' sourced from 4nums.com by nlile. It features difficulty metrics derived from more than 6.4 million human solution attempts recorded between 2012 and 2025.

ParquetSize Categories1 Kn10 KTask Categoriestext GenerationLibrarypolarsTask Categoriesmultiple ChoiceLanguageenTask Idsmultiple Choice QaModalitytextModalitytabularTask Idsopen Domain QaLibrarymlcroissantTask Idsexplanation GenerationLibrarydatasetsLibrarypandasTask Idslanguage ModelingTask CategoriesotherRegionusReasoningMathLicenseapache 20+1

0 views

Mathematics & Statistics

Eurus-2-RL-Data: High-Quality Math and Coding Problems with Verifiers

Eurus-2-RL-Data is a high-quality reinforcement learning training dataset for mathematics and coding problems. It includes outcome verifiers, such as LaTeX answers for math and test cases for coding. The dataset was created by PRIME-RL and was last updated on 2025-02-19.

TextVerificationMathematicsProblem SolvingReinforcement LearningCoding+1

0 views

Mathematics & Statistics

Medical Pills Detection Dataset for Computer Vision Models

Ultralytics medical-pills detection dataset is a proof-of-concept collection of labeled images for training computer vision models to identify medical pills. The dataset was created by Ultralytics and was last updated on February 10, 2025. It is designed to demonstrate the potential of AI in pharmaceutical applications.

ImageMedical PillsHealthcareComputer VisionObject DetectionPharmaceutical+1

0 views

Mathematics & Statistics

AWPCD: Arithmetic Word Problem Compendium Dataset

Arithmetic Word Problem Compendium Dataset (AWPCD) is a collection of mathematical word problems spanning multiple domains with natural language variations. The dataset, created by HelloCephalopod, contains a sample of 1,000 problems, each requiring 1 to 5 steps of mathematical operations. It was last updated on the Hugging Face platform on February 15, 2025.

TextArithmeticWord ProblemsMathematicsEducationNatural Language Processing+1

0 views

Mathematics & Statistics

MiniF2F: Mathematical Problems with Formal Lean Statements

The MiniF2F dataset contains mathematical problems from sources like AMC competitions paired with their formal statements in the Lean theorem prover format. It was created by Tonic and last updated on 2025-02-05. Each example includes both informal mathematical statements and their corresponding formal representations.

TextMathematicsFormal MethodsTheorem ProvingLeanCompetition Problems+1

0 views

Mathematics & Statistics

GSM8K: Grade School Math Word Problems in Traditional Chinese

GSM8K_zh_tw is a dataset for mathematical reasoning in Traditional Chinese, derived from the GSM8K_zh dataset. It contains 7,473 training samples and 1,319 testing samples, translated and regionally adapted for Traditional Chinese users. The dataset was created by DoggiAI and last updated on January 30, 2025.

TextMathematical ReasoningTraditional ChineseQuestion AnsweringEducation+1

0 views

Mathematics & Statistics

Smithers: Mathematical Toolbox for VTK File Handling and Signal Processing

mathLab developed Smithers, a mathematical interdisciplinary toolbox for engineers and scientists, with the most recent update in March 2025. It provides Python-based utilities for signal processing, VTK file handling, and machine learning integration.

Machine LearningSignalVtkToolboxHacktoberfestPythonFilehandling+1

0 views

Mathematics & Statistics

DNZ No 72 BMR: Statistical Reports for Ukrainian Institutions

Ukrainian data from the eu_open_data platform contains statistical reporting information for the institution 'DNZ No 72 BMR'. The dataset includes the institution's number from the ISUO system and its full and abbreviated name. It was last updated on 2025-01-09.

TabularCSVUkraineStatistical ReportingInstitution RegistryPublic Administration+1

0 views

Mathematics & Statistics

ARPA-E Grid Optimization Challenge 2 Competition Materials

ARPA-E Grid Optimization Competition Challenge 2 materials include a 97-page problem formulation document with 299 equations. The challenge, running from 2020 to 2021, expanded on prior problems by incorporating adjustable transformer tap ratios, phase shifting transformers, and price-responsive demand. It involved multiple university and laboratory teams funded through prior prize awards.

ModelSecurity Constrained Optimal Power FlowArpa EGrid OptimizationCompetitionMultiperiod Dynamic MarketsGridComputational ScienceUnit CommitmentOptimizationPowerEnergyAcopfChallenge 2MultiperiodEnergy ModelGo Competition+1

0 views

PreviousPage 113 of 125Next