Sign in to view source links and access this dataset
Description
AlphaXiv's dataset contains prepared splits of the GSM8K grade school math problems for comparing Evolution Strategies and Group Relative Policy Optimization methods in LLM fine-tuning. It includes 6,725 training samples, 1,867 validation samples, and 200 test samples. The dataset was last updated on March 5, 2026.
Use Cases
Fine-tuning LLMs for mathematical reasoning using the 'data' input field containing math problems.
Benchmarking Evolution Strategies against Group Relative Policy Optimization using the provided training, validation, and test splits.
Evaluating model performance on grade school math problems via the reserved test set of 200 samples.
Strengths
Contains 6,725 training examples for model fine-tuning.
Provides a structured split with 1,867 validation and 200 test samples for evaluation.
Limitations
Dataset size is moderate, with under 10,000 total samples.
Specific column structure and data format details are not fully described.
Provenance
Source
AlphaXiv on Hugging Face.
Collection Method
Prepared splits of the GSM8K dataset.
Freshness
Last updated on March 5, 2026.
Full data format and column details require visiting the Hugging Face dataset page. License information is unknown.