Sign in to view source links and access this dataset
Description
LSReasoning-10000000 is a large-scale synthetic dataset for mathematical reasoning, built via a Python script. The dataset includes problems covering addition, subtraction, multiplication, division, linear equations, fractional equations, two-step equations, and algebra word problems. It was created by DataMuncher-Labs and last updated on December 29, 2025.
Use Cases
Fine-tuning language models for solving algebra word problems based on the described problem types.
Benchmarking model performance on multi-step mathematical reasoning based on the inclusion of two-step equations.
Training models to generate step-by-step solutions based on the 'how_to_solve' field mentioned in the description.
Developing educational tools for practicing arithmetic and algebraic operations based on the covered topics like addition and linear equations.
Strengths
Covers a range of mathematical topics including algebra word problems and synthetic algebra-heavy reasoning.
Explicitly structured with fields for question, problem, how_to_solve, and answer as described.
Licensed under MIT, making it free for use.
Limitations
The exact number of rows is unknown, making scale assessment difficult.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
DataMuncher-Labs on Hugging Face
Collection Method
Built via a Python script, suggesting synthetic generation.
Freshness
Last updated 2025-12-29 21:59:20; freshness should be verified.
The dataset creator notes it is better to use smaller variants for fine-tuning, suggesting the full 10-million-scale dataset may be very large.