Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,487 datasets
Nemotron-Cascade-RL-Math is a dataset of 14,476 math problems and short answers, created by NVIDIA and last updated on December 16, 2025. It is designed for reinforcement learning in mathematical reasoning, aggregating and decontaminating data from sources like OpenMathReasoning, NuminaMath-CoT, DeepScaleR, and AceReason-Math.
Nemotron-Cascade-RL-Math is a dataset of 14,476 math problems and short answers for reinforcement learning. It was created by NVIDIA, aggregating and filtering content from sources like OpenMathReasoning and NuminaMath-CoT. The dataset was last updated in December 2025.
Comprising a sequence of happy Chen primes, a specific class of prime numbers. It was created by author Emanuele Pace to investigate the question of whether there are infinitely many such primes. The dataset's row count, column structure, and size are unknown.
A dataset from Kaggle exploring the relationship between skills and future career paths. The specific number of records, features, and temporal coverage are unknown. The data likely contains mappings or correlations between skill sets and occupational outcomes.
A dataset from Kaggle's Research category concerning adaptive promotion optimization. The description indicates it involves contextual multi-armed bandit algorithms for real-time decision-making. The dataset's specific size, author, and update date are not provided.
A research dataset on optimizing oral acetaminophen dosing for pediatric patients using a genotype-driven Bayesian approach. The dataset was sourced from Kaggle and is categorized under Research. Specific details on volume, authorship, and creation date are not provided in the input.
A Kaggle dataset focused on reinforcement learning methods for optimizing conditional cash transfer programs. The dataset likely contains simulation or policy evaluation data for social policy research. The author and organization are unknown.
Burstgpt provides workload traces for ChatGPT (GPT-3.5) and GPT-4, released by HPMLL to facilitate the optimization of Large Language Model (LLM) serving systems. The data captures request patterns and arrival characteristics from production-scale models as of early 2024. It is designed to help researchers model the 'bursty' nature of inference traffic in high-performance computing environments.
Multiple relational tables containing security event logs paired with a collection of advanced SQL queries. The data focuses on complex schema interactions for high-level database analysis and security auditing practice.
A text classification dataset for operations research (OR) questions, created by yilingwang and last updated on Hugging Face in December 2025. It contains questions classified into categories such as Linear Programming and Integer Programming, with detailed reasoning provided. The dataset's specific size and number of rows are not detailed in the provided metadata.
Inequality proving tests advanced reasoning skills like discovering tight bounds and applying theorems, making it a distinct frontier for large language models. The dataset, created by AI4Math, was last updated on December 15, 2025. Its specific size and structure are not detailed in the provided metadata.
Topic information likely related to search optimization processes. The dataset is hosted on Kaggle, but its specific content, size, and creation details are unknown. Metadata is minimal; actual content requires verification after download.
Financial Budget Optimization Dataset is a tabular dataset published on Kaggle. The raw description indicates it contains organizational budget allocation and revenue data. The specific number of rows, columns, and other metadata are currently unknown.
Experimental data relates to optimizing pulse schemes for linear learning and forgetting in bilayer oxide resistive switching devices. The dataset was authored by THAMANKAR, RAMESH MOHAN and last updated on January 21, 2026.
GRAD is a high-quality synthetic mathematics dataset containing 1,933 original problems at graduate and research level, each accompanied by a complete, detailed, step-by-step proof in clear mathematical English and LaTeX. All problems and proofs were generated from scratch by Xerv-AI and released in December 2025. No entry has ever appeared in textbooks, competition archives, or research papers.
A sequence of prime numbers that simultaneously satisfy the mathematical definitions of Sophie Germain primes, happy primes, and balanced primes. The dataset, authored by Emanuele Pace, raises the open question of whether there are infinitely many such primes. The specific count of rows and columns is not provided.
Replication material supports the academic article 'Measuring Fairness' from the Social Sciences and Law domains. The data was provided by author Barry Edwards and was last updated in January 2026. Specific details on rows, columns, and file formats are unavailable.
A sequence of integers generated by the formula a(n) = λ(2^n - 1), where λ is the Carmichael lambda function. The dataset was created by author Emanuele Pace and is hosted on the Dataverse platform. The specific number of rows and columns is not provided.
MathProofread is a dataset hosted on Kaggle, likely containing text related to mathematical problems or proofs. The dataset's specific content, size, and creation details are not provided in the available metadata. Its title suggests a focus on mathematical language and verification tasks.
2 million mathematical questions and answers curated from various Stack Exchange sites. The dataset was created by the author 'math-ai' and was last updated on November 20, 2025. It is intended as a resource for mathematics and AI research.