GSM8K (Grade School Math 8K) is a dataset of 8.5K high-quality, linguistically diverse grade school math word problems. It was developed by OpenAI to support the task of question answering on basic mathematical problems that require multi-step reasoning. The problems are designed to be solved in 2 to 8 steps using basic arithmetic operations.
Use Cases
- Benchmarking multi-step mathematical reasoning in large language models
- Fine-tuning models for chain-of-thought reasoning
- Evaluating question-answering performance on arithmetic word problems
Strengths
- High-quality, linguistically diverse problem sets
- Requires multi-step reasoning (2-8 steps) rather than single-step retrieval
- Official benchmark for mathematical reasoning in LLMs
Limitations
- Limited to basic arithmetic operations
- Monolingual (English only)
Provenance
- Source
- OpenAI
- Collection Method
- Crowdsourced
- Freshness
- Last updated December 20, 2025.