DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,487 datasets

Nemotron-Cascade-RL-Math: 14,476 Math Problems for Reinforcement Learning

Nemotron-Cascade-RL-Math is a dataset of 14,476 math problems and short answers, created by NVIDIA and last updated on December 16, 2025. It is designed for reinforcement learning in mathematical reasoning, aggregating and decontaminating data from sources like OpenMathReasoning, NuminaMath-CoT, DeepScaleR, and AceReason-Math.

TextMathematicsReasoningReinforcement LearningEducational Data+1

0 views

Mathematics & Statistics

Nemotron Math Reasoning Dataset for Reinforcement Learning

Nemotron-Cascade-RL-Math is a dataset of 14,476 math problems and short answers for reinforcement learning. It was created by NVIDIA, aggregating and filtering content from sources like OpenMathReasoning and NuminaMath-CoT. The dataset was last updated in December 2025.

TextMathematicsReasoningReinforcement LearningEducational Data+1

0 views

Mathematics & Statistics

Sequence of Happy Chen Primes in Mathematical Sciences

Comprising a sequence of happy Chen primes, a specific class of prime numbers. It was created by author Emanuele Pace to investigate the question of whether there are infinitely many such primes. The dataset's row count, column structure, and size are unknown.

Mathematical Sciences+1

0 views

Mathematics & Statistics

Skills to Careers Mapping Dataset

A dataset from Kaggle exploring the relationship between skills and future career paths. The specific number of records, features, and temporal coverage are unknown. The data likely contains mappings or correlations between skill sets and occupational outcomes.

TabularCareer PredictionLabor MarketSkills Mapping+1

0 views

Mathematics & Statistics

Adaptive Real-Time Promotion Mix Optimization Using Contextual Multi-Armed Bandits

A dataset from Kaggle's Research category concerning adaptive promotion optimization. The description indicates it involves contextual multi-armed bandit algorithms for real-time decision-making. The dataset's specific size, author, and update date are not provided.

TabularContextual BanditsMarketing AnalyticsMulti Armed BanditAdaptive LearningResearchPromotion Optimization+1

0 views

Mathematics & Statistics

Genotype-Driven Bayesian Optimization of Pediatric Acetaminophen Dosing

A research dataset on optimizing oral acetaminophen dosing for pediatric patients using a genotype-driven Bayesian approach. The dataset was sourced from Kaggle and is categorized under Research. Specific details on volume, authorship, and creation date are not provided in the input.

TabularBayesian OptimizationResearchPharmacogenomicsClinical ResearchPediatric Dosing+1

0 views

Mathematics & Statistics

Adaptive Policy Optimization for Conditional Cash Transfer Programs

A Kaggle dataset focused on reinforcement learning methods for optimizing conditional cash transfer programs. The dataset likely contains simulation or policy evaluation data for social policy research. The author and organization are unknown.

TabularPolicy OptimizationSocial PolicyResearchConditional Cash TransferReinforcement Learning+1

0 views

Mathematics & Statistics

BurstGPT: ChatGPT and GPT-4 Workload Traces for LLM Serving Optimization

Burstgpt provides workload traces for ChatGPT (GPT-3.5) and GPT-4, released by HPMLL to facilitate the optimization of Large Language Model (LLM) serving systems. The data captures request patterns and arrival characteristics from production-scale models as of early 2024. It is designed to help researchers model the 'bursty' nature of inference traffic in high-performance computing environments.

Llm ServingLarge Language ModelMlsys+1

0 views

Mathematics & Statistics

SQL Practice Dataset 3 (Hard) + Queries

Multiple relational tables containing security event logs paired with a collection of advanced SQL queries. The data focuses on complex schema interactions for high-level database analysis and security auditing practice.

Exploratory Data AnalysisCyber SecurityAdvancedSqlData Analytics+1

0 views

Mathematics & Statistics

OR Bench Classify: Operations Research Questions with Category Labels

A text classification dataset for operations research (OR) questions, created by yilingwang and last updated on Hugging Face in December 2025. It contains questions classified into categories such as Linear Programming and Integer Programming, with detailed reasoning provided. The dataset's specific size and number of rows are not detailed in the provided metadata.

TextJSONSize Categories1 Kn10 KLibrarypolarsLibrarydaskLanguageenMathematical OptimizationModalitytextTask Categoriestext RetrievalLinear ProgrammingLibrarymlcroissantOperations ResearchLibrarydatasetsText ClassificationRegionusReasoningOptimizationTask Categoriestext ClassificationLicenseapache 20Integer Programming+1

0 views

Mathematics & Statistics

IneqMath: Inequality Proofs for Large Language Model Evaluation

Inequality proving tests advanced reasoning skills like discovering tight bounds and applying theorems, making it a distinct frontier for large language models. The dataset, created by AI4Math, was last updated on December 15, 2025. Its specific size and structure are not detailed in the provided metadata.

TextMathematicsLlm EvaluationReasoning BenchmarkInequality Proofs+1

0 views

Mathematics & Statistics

Topic Information for Search Optimization

Topic information likely related to search optimization processes. The dataset is hosted on Kaggle, but its specific content, size, and creation details are unknown. Metadata is minimal; actual content requires verification after download.

TabularTopic ModelingInformation Retrieval+1

0 views

Mathematics & Statistics

Financial Budget Optimization Dataset with Allocation and Revenue Data

Financial Budget Optimization Dataset is a tabular dataset published on Kaggle. The raw description indicates it contains organizational budget allocation and revenue data. The specific number of rows, columns, and other metadata are currently unknown.

TabularBudget AllocationRevenueFinanceFinancial DataOptimization+1

0 views

Mathematics & Statistics

Optimized Voltage Pulse Schemes for Bilayer Oxide Memristors

Experimental data relates to optimizing pulse schemes for linear learning and forgetting in bilayer oxide resistive switching devices. The dataset was authored by THAMANKAR, RAMESH MOHAN and last updated on January 21, 2026.

EngineeringLinearity in LearningArtificial SynapsePhysicsNeuromorphic computationResistive switching+1

0 views

Mathematics & Statistics

GRAD: 1,933 Graduate-Level Synthetic Math Problems with Proofs

GRAD is a high-quality synthetic mathematics dataset containing 1,933 original problems at graduate and research level, each accompanied by a complete, detailed, step-by-step proof in clear mathematical English and LaTeX. All problems and proofs were generated from scratch by Xerv-AI and released in December 2025. No entry has ever appeared in textbooks, competition archives, or research papers.

TextJSONSize Categories1 Kn10 KTask Categoriestext GenerationLibrarypolarsTask Categoriesquestion AnsweringLanguageenModalitytextMathematicsAnnotations Creatorsno AnnotationLibrarymlcroissantLibrarydatasetsLibrarypandasLong ProofProof GenerationFine TuningRegionusReasoningGraduate LevelResearch LevelSynthetic DataLicensemitSynthetic+1

0 views

Mathematics & Statistics

Prime Numbers with Sophie Germain, Happy, and Balanced Properties

A sequence of prime numbers that simultaneously satisfy the mathematical definitions of Sophie Germain primes, happy primes, and balanced primes. The dataset, authored by Emanuele Pace, raises the open question of whether there are infinitely many such primes. The specific count of rows and columns is not provided.

Mathematical Sciences+1

0 views

Mathematics & Statistics

Replication Data for Measuring Fairness Article

Replication material supports the academic article 'Measuring Fairness' from the Social Sciences and Law domains. The data was provided by author Barry Edwards and was last updated in January 2026. Specific details on rows, columns, and file formats are unavailable.

Social SciencesLaw+1

0 views

Mathematics & Statistics

Carmichael Lambda Function Integer Sequence

A sequence of integers generated by the formula a(n) = λ(2^n - 1), where λ is the Carmichael lambda function. The dataset was created by author Emanuele Pace and is hosted on the Dataverse platform. The specific number of rows and columns is not provided.

Mathematical Sciences+1

0 views

Mathematics & Statistics

MathProofread: Mathematical Text Proofreading Dataset

MathProofread is a dataset hosted on Kaggle, likely containing text related to mathematical problems or proofs. The dataset's specific content, size, and creation details are not provided in the available metadata. Its title suggests a focus on mathematical language and verification tasks.

TextMathematicsEducationProofreading+1

0 views

Mathematics & Statistics

StackMathQA: 2 Million Mathematical Questions and Answers from Stack Exchange

2 million mathematical questions and answers curated from various Stack Exchange sites. The dataset was created by the author 'math-ai' and was last updated on November 20, 2025. It is intended as a resource for mathematics and AI research.

TextMathematicsQuestion AnsweringAi TrainingLarge ScaleStack Exchange+1

0 views

PreviousPage 107 of 125Next