DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,487 datasets

Verifiable Math Problems Subset from SYNTHETIC-1

A subset of the task data used to construct the SYNTHETIC-1 collection, created by PrimeIntellect and last updated in February 2025. It contains mathematical problems for text-based problem-solving tasks. The dataset is tagged for Mathematics, Text, and Synthetic Data.

ParquetLibrarypolarsLibrarydaskModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Mathematics & Statistics

Hendrycks MATH: Mathematical Problem Solving Benchmark

The MATH dataset is a collection of mathematical problems for evaluating problem-solving capabilities. It was created by researchers including Dan Hendrycks and Collin Burns and published at NeurIPS in 2021. The dataset is hosted on Hugging Face by EleutherAI and was last updated in January 2025.

TextParquetSize Categories10 Kn100 KLibrarypolarsAi BenchmarkingModalitytextMathematicsLibrarymlcroissantLibrarydatasetsLibrarypandasEducationProblem SolvingRegionusLicensemit+1

0 views

Mathematics & Statistics

ProcessBench: A Benchmark for Identifying Process Errors in Mathematical Reasoning

ProcessBench is a benchmark dataset proposed by the Qwen Team for evaluating the identification of process errors in mathematical reasoning. The dataset is hosted on Hugging Face and was last updated on December 27, 2024. The associated GitHub repository contains evaluation code and prompt templates used in the work.

TextMathematical ReasoningEvaluationBenchmarkProcess Errors+1

0 views

Mathematics & Statistics

Official Register of Municipalities and Localities in Rhineland-Palatinate

Rhineland-Palatinate's official location register combines municipal lists from the state's Statistical Office and its Office for Surveying and Geo-Based Information. The dataset is a presentation service provided via WMS and was last updated on November 6, 2024. It is published by the Bundesamt für Kartographie und Geodäsie.

GeospatialAdministrative BoundariesGerman MunicipalitiesGeospatial DataGeographic Register+1

0 views

Mathematics & Statistics

List of Municipalities in Rhineland-Palatinate with Under 5,000 Residents

Rhineland-Palatinate's official register of municipalities and cities with fewer than 5,000 residents. The list is maintained by the Statistical Office of Rhineland-Palatinate and the State Office for Surveying and Geo-Based Information. It was last updated on November 6, 2024.

GeospatialGerman MunicipalitiesSmall TownsAdministrative RegisterGeospatial Data+1

0 views

Mathematics & Statistics

OpenLongCoT-Pretrain: Pairwise Optimization Data for Olympiad-Level Mathematical Reasoning

OpenLongCoT-Pretrain is a dataset referenced in the LLaMA-Berry research paper for pairwise optimization in mathematical reasoning. The dataset likely contains training examples aimed at achieving high-level mathematical problem-solving performance, as described in the associated arXiv preprint. It was uploaded to Hugging Face by the author di-zhang-fdu on October 28, 2024.

TextMathematical ReasoningLlama BerryPairwise OptimizationOlympiad Level+1

0 views

Mathematics & Statistics

ARPA-E Grid Optimization Challenge 1 Synthetic Power System Models

ARPA-E Grid Optimization Challenge 1 data from 2018-2019 provides synthetic power system network models for the Security Constrained AC Optimal Power Flow problem. The collection includes Real-Time and Online datasets with operating scenarios defining instantaneous power demand, renewable generation, and component availability. It was used for a competition requiring solvers to compute a base case operating point and verify feasibility across contingencies.

Optimal PowerflowModelSecurity ConstrainedArpa EGrid OptimizationSynthetic Grid DataCompetitionGridComputational ScienceUnit CommitmentOptimizationPowerEnergyAcopfEnergy ModelGo Competition+1

0 views

Mathematics & Statistics

Synthetic Power Grid Optimization Scenarios from ARPA-E Challenge 3

August 2023 Event 4 data includes 591 synthetic scenarios derived from 9 network models, totaling 3.6 GB. The dataset supports the ARPA-E Grid Optimization Competition Challenge 3, focusing on security-constrained optimal power flow problems for multiperiod dynamic markets. It contains results from 14 teams who solved 669 scenarios, with funding and prizes awarded across multiple competition events.

ModelSecurity Constrained Optimal Power FlowArpa EChallenge 3Grid OptimizationCompetitionMultiperiod Dynamic MarketsGridComputational ScienceUnit CommitmentOptimizationPowerEnergyAcopfMultiperiodEnergy ModelGo Competition+1

0 views

Mathematics & Statistics

Lean Workbook: 140,124 Formalized Math Contest Problems

140,124 contest-level math problems formalized in the Lean 4 theorem prover, created by internlm and released in October 2024. The dataset includes natural language statements, answers, formal statements, and formal proofs where available. It is intended to support the training of autoformalization models and automated proof search.

TextMathematicsAutoformalizationMath ContestNatural Language ProcessingLean Theorem ProverFormal Proofs+1

0 views

Mathematics & Statistics

Prooffol: A Collection of Mathematical Proofs

Prooffol is a dataset uploaded to Hugging Face by author ramyakeerthyt on 2024-11 06. The title suggests it likely contains formal proofs or logical statements. The dataset's specific content, size, and structure require verification after download.

TextProof CheckingMathematical ProofsFact Verification+1

0 views

Mathematics & Statistics

DeepSeek-Prover V1: 10K-100K Synthetic Lean Mathematical Proofs

DeepSeek-Prover V1 contains between 10,000 and 100,000 synthetic mathematical proof records designed for the Lean proof assistant. Developed by deepseek-ai and released in 2024, this dataset facilitates the training and evaluation of large language models in formal mathematical reasoning.

JSONSize Categories10 Kn100 KLicenseotherLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasArxiv240514333Regionus+1

0 views

Mathematics & Statistics

TAT-QA Arithmetic CoT: Chain-of-Thought Reasoning for Financial QA

A synthetically generated Chain of Thought (CoT) version of the TAT-QA arithmetic dataset, created by prompting Llama3 70B Instruct. The dataset was produced by Cerebras as part of their work on Cerebras DocChat, a document-based conversational Q&A model, to address arithmetic reasoning errors. It was last updated on August 19, 2024.

TextArithmetic ReasoningChain Of ThoughtQuestion AnsweringSynthetic DataSynthetic+1

0 views

Mathematics & Statistics

Lean 4 Theorem Repository for Formal Verification

A collection of 29,000 theorems compiled from over 100 Lean 4 repositories. It was created by InternLM to support the development of theorem provers, including the fine-tuned 7B model InternLM2-Step-Prover.

ParquetLibrarypolarsArxiv240717227ModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusLicenseapache 20+1

0 views

Mathematics & Statistics

Python Lottery Analysis: Historical Results and Astronomical Data

Historical lottery draw results integrated with astronomical data, developed by szczyglis-dev and last updated in August 2024. The repository provides a Jupyter notebook demonstrating statistical analysis, linear regression, and visualization of number distributions.

CSVSkyfieldJupyterData SciencePredictive ModelingNotebook JupyterAstronomyPythonLottery DrawProbability DistributionRandomPlotRelationshipLinear RegressionAnalyze Data+1

0 views

Mathematics & Statistics

Sujet Financial RAG FR: French Financial Question-Context Pairs for Embedding Models

A proof-of-concept collection of French question-context pairs designed for training and evaluating embedding models in the financial domain. The dataset was created by sujet-ai and last updated on July 28, 2024. It contains hand-selected examples from publicly available French financial documents.

TextParquetSize Categories10 Kn100 KLibrarypolarsFrench TextFinancial QaFinancial Question AnswerRagModalitytextLibrarymlcroissantFinancial RagEmbedding FinetuningLibrarydatasetsLibrarypandasQuestion AnsweringFinancial EmbeddingRegionusFinanceLicensemitEmbedding Model Finetuning+1

0 views

Mathematics & Statistics

PutnamBench: Competition Mathematics Problems Formalized in Lean, Isabelle, and Coq

PutnamBench comprises over 1300 manual formalizations of problems from the William Lowell Putnam Mathematical Competition between 1965 and 2023. The benchmark supports three formal languages: Lean 4, Isabelle, and Coq. It was created by amitayusht and last updated on Hugging Face in June 2024.

TextMathematicsBenchmarkFormal LanguageTheorem Proving+1

0 views

Mathematics & Statistics

TheoremQA: 800 STEM Question-Answer Pairs Based on 350+ Theorems

TheoremQA is a dataset of 800 question-answer pairs created by human experts at TIGER-Lab. It covers over 350 theorems across mathematics, electrical engineering & computer science, physics, and finance. The dataset was uploaded to Hugging Face on May 15, 2024, and is intended as a benchmark for testing large language models on university-level problem-solving.

TextMathematicsTheorem QaBenchmarkStem EducationPhysicsFinance+1

0 views

Mathematics & Statistics

Partial Discharge Measurements Under Steep-Fronted Voltage Pulses

Six sets of Matlab files containing 50 samples each of partial discharge (PD) signals and corresponding voltage impulses, sampled at 20 GSps. The data was collected using a Vivaldi antenna in response to sudden voltage changes and is sorted by applied voltage amplitude. The dataset was authored by Juan Manuel Martínez-Tarifa and last updated in May 2024.

Time SeriesHigh VoltageSignal ProcessingMatlabPartial DischargeElectrical Engineering+1

0 views

Mathematics & Statistics

Grade School Mathematics Instruction Dataset

Nearly one million instructions in JSON format cover topics like calculus, probability, algebra, and trigonometry. The dataset was created by ajibawa-2023 and released on the Hugging Face platform, with a last recorded update in May 2024. It is structured for instruction tuning to support model development and research.

TextJSONTask Categoriestext GenerationTask Categoriesquestion AnsweringLibrarydaskStudentLanguageenModalitytextAlgebraSize Categories100 Kn1 MMathematicsLibrarymlcroissantDoi1057967hf3167LibrarydatasetsMathsQuestion AnsweringEducationGrade SchoolRegionusCalculusLarge ScaleLicenseapache 20ProbabilityLiner Algebra+1

0 views

Mathematics & Statistics

Minimum Modulus Visualizations of Algebraic Fractal Prisoner Sets

A 2024 dataset by Severino Fernández Galán presents a novel method for visualizing algebraic fractals. The method colors points in the complex plane based on the minimum modulus within their generated sequences, offering aesthetic views of prisoner sets. It was harvested from the e-cienciaDatos Dataverse platform.

ImageMathematical VisualizationAlgebraic FractalsComplex AnalysisFractalsSynthetic+1

0 views

PreviousPage 114 of 125Next