DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,487 datasets

Middle layer Super Output Areas (December 2021) Boundaries EW BFE (V8) and Rural Urban Cla

Digital vector boundaries for Middle layer Super Output Areas in England and Wales as at 21 March 2021. The dataset is joined to the Rural Urban Classification data 2021, a product developed by the Office for National Statistics, Department for Environment, Food and Rural Affairs, and Welsh Assembly Government. Source data is licensed under the Open Government Licence v.3.0.

GeospatialZIPCSVEngland WalesAdministrative UnitsGeospatial BoundariesRural Urban Classification+1

0 views

Mathematics & Statistics

Lower layer Super Output Areas (December 2021) Boundaries EW BFE (V10) and Rural Urban Cla

Digital vector boundaries for Lower layer Super Output Areas in England and Wales, as at 21 March 2021. The data is joined to the Rural-Urban Classification 2021, a product developed by the Office for National Statistics, Defra, and the Welsh Assembly Government. Source data is licensed under the Open Government Licence v.3.0 and contains OS data © Crown copyright 2025.

GeospatialZIPCSVEngland WalesGeospatial BoundariesRural Urban ClassificationComputer VisionAdministrative Areas+1

0 views

Mathematics & Statistics

Local Authority Districts (December 2021) Boundaries EW BFE and Rural Urban Classification

England and Wales digital vector boundaries for Local Authority Districts as of December 2021. The dataset is joined to the 2021 Rural-Urban Classification, a product developed by the Office for National Statistics, the Department for Environment, Food and Rural Affairs, and the Welsh Assembly Government. Source data is from the Office for National Statistics and contains OS data © Crown copyright 2025.

GeospatialEngland WalesAdministrative BoundariesRural Urban ClassificationComputer VisionGovernment Statistics+1

0 views

Mathematics & Statistics

England and Wales Output Area Boundaries with Rural Urban Classification, 2021

England and Wales digital vector boundaries for Output Areas as of December 2021. The dataset is joined to the 2021 Rural-Urban Classification, a product developed by the Office for National Statistics, the Department for Environment, Food and Rural Affairs, and the Welsh Assembly Government. Source data is from the Office for National Statistics licensed under the Open Government Licence v.3.0.

GeospatialEngland WalesGeospatial BoundariesRural Urban ClassificationComputer VisionCensus Geography+1

0 views

Mathematics & Statistics

Middle layer Super Output Areas (December 2021) Boundaries EW BFE (V8) and Rural Urban Cla

Digital vector boundaries for Middle layer Super Output Areas (MSOAs) in England and Wales as of March 2021. This dataset is joined to the 2021 Rural-Urban Classification, a Government Statistical Service product developed by the Office for National Statistics, Defra, and the Welsh Assembly Government. Source data is from the Office for National Statistics and contains OS data © Crown copyright 2025.

GeospatialEngland WalesAdministrative UnitsGeospatial BoundariesRural Urban Classification+1

0 views

Mathematics & Statistics

Lower layer Super Output Areas (December 2021) Boundaries EW BFE (V10) and Rural Urban Cla

Lower layer Super Output Areas (December 2021) Boundaries EW BFE (V10) and Rural Urban Classification data 2021 contains digital vector boundaries for Lower layer Super Output Areas in England and Wales, as of 21 March 2021. The data is joined to the 2021 Rural-Urban Classification, a Government Statistical Service product developed by the Office for National Statistics, Defra, and the Welsh Assembly Government. The source is the Office for National Statistics, licensed under the Open Government Licence v.3.0.

GeospatialEngland WalesGeospatial BoundariesRural Urban ClassificationComputer VisionAdministrative Areas+1

0 views

Mathematics & Statistics

England and Wales Output Area Boundaries with Rural Urban Classification 2021

Office for National Statistics provides digital vector boundaries for Output Areas in England and Wales as of December 2021. The boundaries are joined to the 2021 Rural-Urban Classification data, a product developed by the Office for National Statistics, Department for Environment, Food and Rural Affairs, and the Welsh Assembly Government. The data is licensed under the Open Government Licence v.3.0 and contains OS data © Crown copyright 2025.

GeospatialEngland WalesGeospatial BoundariesRural Urban ClassificationComputer VisionCensus Geography+1

0 views

Mathematics & Statistics

Human-Like DPO: 1,000 Preference Examples for Language Model Training

mlx-community provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. It contains 1,000 total examples, split into 800 for training, 100 for validation, and 100 for testing. The dataset was last updated on May 27, 2025.

TextAi SafetyTraining DataBenchmarkLanguage ModelPreference Optimization+1

0 views

Mathematics & Statistics

Codeforces Competitive Programming Problems and Solutions

Codeforces, a popular competitive programming platform, provides a collection of over 10,000 unique algorithmic problems. The dataset, created by open-r1, includes problems from the earliest contests up to 2025, designed to test code reasoning capabilities.

TextTabularParquetSize Categories10 Kn100 KCompetitive ProgrammingLibrarypolarsLibrarydaskModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsAi For CodeLicensecc By 40Algorithmic ProblemsCode GenerationRegionus+1

0 views

Mathematics & Statistics

Codeforces Competitive Programming Submissions

Millions of real user code submissions from the Codeforces competitive programming platform. The dataset contains human solutions to challenging algorithmic optimization problems, curated by open-r1 and last updated in May 2025.

TextCompetitive ProgrammingSoftware EngineeringSource CodeAlgorithmsProgramming+1

0 views

Mathematics & Statistics

Formal Problem Solving Main

A benchmark dataset from the research paper 'Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving'. It was created by author 'purewhite42' and last updated on May 8, 2025. The research focuses on formulating problem-solving as a deterministic Markov decision process within formal theorem proving environments.

TextMathematicsBenchmarkProblem SolvingArtificial IntelligenceFormal MethodsTheorem Proving+1

0 views

Mathematics & Statistics

PutnamBench-Solving: A Formal Problem-Solving Benchmark for Theorem Proving

PutnamBench-Solving is a benchmark for evaluating formal problem-solving within theorem proving environments. The dataset is part of the official implementation for research on process-verified problem-solving beyond proving known targets. It was created by author purewhite42 and last updated on May 8, 2025.

TextMathematicsBenchmarkProblem SolvingFormal MethodsTheorem Proving+1

0 views

Mathematics & Statistics

Herald Proofs: 45,000 Natural Language to Formal Logic Proofs

Herald Proofs is a dataset of 45,000 natural language to formal logic (NL-FL) proofs, constituting the proof part of the larger Herald dataset. The dataset was created by authors including Guoxiong Gao and Yutong Wang and presented at the International Conference on Learning Representations in 2025. It is associated with the Lean 4 theorem prover, specifically version v4.11.0.

TextLean4Natural Language ProcessingTheorem ProvingFormal Proofs+1

0 views

Mathematics & Statistics

DeepSeek-ProverBench: Formal Theorem Proving Data for Lean 4

DeepSeek-ProverBench is a dataset for training and evaluating large language models on formal theorem proving in the Lean 4 environment. It was created by deepseek-ai using a recursive theorem proving pipeline powered by the DeepSeek-V3 model to decompose complex problems into subgoals. The dataset was last updated on April 30,我们发现了一个问题，输入中的描述是中文的，但输出要求是英文。根据指令，我需要将输入翻译成英文。让我重新处理。2025.

TextJSONLibrarypolarsSize Categoriesn1 KModalitytextMathematicsLibrarymlcroissantLibrarydatasetsLibrarypandasLean 4Formal MethodsRegionusLlm TrainingTheorem Proving+1

0 views

Mathematics & Statistics

Rural Urban Classification (2021) for Output Areas in England and Wales

A statistical classification of 2021 Output Areas (OA) in England and Wales as rural or urban, produced by the Office for National Statistics (ONS). The classification is based on address density, physical settlement form, population size, and relative access to major towns and cities. It provides a consistent view of geography for higher-level areas like LSOA, MSOA, and LAD.

GeospatialZIPCSVOAPrd RucRural Urban ClassificationDemographicsCensus GeographyRuc21England And WalesPrd Ruc OaOutput AreaRucOutput areas+1

0 views

Mathematics & Statistics

Vermont Public Library Survey Metrics 2018-2023

Vermont's submissions to the Public Library Survey provide annual statistics on library operations and services from 2018 to 2023. The data is collected by the Institute of Museum and Library Services (IMLS) and intended to be filled out by every public library in the country each year.

TabularCSVXMLJSONGovernment DataCultural InstitutionsPublic Library ServicesLibrary Statistics+1

0 views

Mathematics & Statistics

LLM-SRBench: 239 Scientific Equation Discovery Problems

239 challenging problems across four scientific domains designed to evaluate LLM-based scientific equation discovery methods. The benchmark was created by author nnheui and last updated on 2025-04-20. It includes a category, LSR-Transform, which transforms common physical models into less common mathematical representations to test reasoning beyond memorization.

TextLlm BenchmarkScientific DiscoveryMathematical RepresentationBenchmarkEquation Discovery+1

0 views

Mathematics & Statistics

Turkish Math Problems for Multi-Step Reasoning Training

Phase 2 of a curriculum learning pipeline contains moderately difficult math problems requiring multiple reasoning steps in Turkish. The dataset was created by author 'erayalp' and was last updated in April 2025. It is designed to bridge the gap between basic arithmetic and complex problem-solving for language models.

TextMath ReasoningTurkish LanguageCurriculum Learning+1

0 views

Mathematics & Statistics

Behavioral Risk Factors: Selected Metropolitan Area Risk Trends (SMART) MMSA Age-adjusted

2011 to present. Age-adjusted prevalence data from the Behavioral Risk Factor Surveillance System (BRFSS) for selected U.S. metropolitan statistical areas (MMSAs) with 500 or more respondents. The data is collected by the CDC through a continuous, state-based surveillance system tracking modifiable risk factors for chronic diseases. It is updated annually and hosted on data.cdc.gov.

TabularTime SeriesCSVXMLJSONBehavioral Risk FactorsSurvey DataHealthcareAge Adjusted PrevalenceMetropolitan StatisticsPublic Health+1

0 views

Mathematics & Statistics

SMART MMSA: Behavioral Risk Factor Survey Data for Metropolitan Areas (2011-Present)

2011 to present. The dataset contains prevalence data from the Behavioral Risk Factor Surveillance System (BRFSS) for selected Metropolitan Statistical Areas (MMSAs) with 500 or more respondents. It is produced by the Centers for Disease Control and Prevention (CDC) and is updated annually to track modifiable risk factors for chronic diseases and leading causes of death.

TabularTime SeriesBehavioral Risk FactorsMmsaChronic DiseaseSurvey DataSurveyBehavioralHealthcareSmartBrfssMetropolitan StatisticsRiskPublic Health+1

0 views

PreviousPage 112 of 125Next