Sign in to view source links and access this dataset
Description
193,938 long-form reasoning traces and solutions for research-level mathematical problems, released alongside ResearchMath-14k. The dataset contains model-generated solution attempts, each with a problem statement, a chain-of-thought reasoning trace, and a final response. It was authored by 'amphora' and last updated on Hugging Face in June 2026.
Use Cases
Training language models for mathematical reasoning based on long chain-of-thought traces.
Benchmarking model performance on research-level problem-solving based on curated problem statements.
Analyzing the structure and patterns of model-generated reasoning for advanced mathematics.
Developing techniques to improve the accuracy and reliability of AI-generated solutions.
Strengths
Contains 193,938 records, providing substantial scale for training and analysis.
Each record includes a self-contained problem statement, a long reasoning trace, and a final response, offering a complete solution context.
Released as part of a published research paper, suggesting academic rigor.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect bias inherent to the model(s) used to generate the solution attempts.
Provenance
Source
Hugging Face dataset authored by 'amphora', released alongside ResearchMath-14k.
Collection Method
Likely contains model-generated solution attempts for curated research-level mathematical problems.
Freshness
Last updated 2026-06-12 18:25:17; freshness should be verified.
License is unknown; terms of use must be verified before application.