GSM8K-Hi is a Hindi-translated version of the English GSM8K test set for mathematical reasoning. The dataset, created by NVIDIA and last updated in January 2026, contains problems requiring 2 to 8 steps to solve using basic arithmetic operations. Samples were translated via Google Cloud Platform and subsequently reviewed and corrected by human annotators for quality.
Use Cases
- Benchmarking multilingual mathematical reasoning models based on the described multi-step word problems.
- Training language models for arithmetic problem-solving in Hindi based on the described elementary calculations.
- Studying the impact of human review on machine-translated educational content based on the described quality improvement process.
Strengths
- Problems are described as requiring 2 to 8 steps to solve, providing a structured complexity range.
- Human annotators reviewed and corrected the machine-translated samples for quality improvement.
- The dataset is explicitly stated as ready for commercial and non-commercial use.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and dataset size are unknown, which may limit suitability assessment.
Provenance
- Source
- NVIDIA, derived from the English GSM8K test set.
- Collection Method
- Translated via Google Cloud Platform, then reviewed and corrected by human annotators.
- Time Range
- null
- Freshness
- Last updated 2026-01-16 11:14:03; freshness should be verified.
- Geography
- null