Name: Open Agentrl Grpo 2K: A Compact Dataset for GRPO Training Across Math, Science, and Code
Creator: y-ohtani
Published: 2026-02-28T10:10:50
Keywords: Size Categories1 Kn10 K, Librarypolars, OPTIMIZED-PARQUET, Languageen, Arxiv231112022, Modalitytext, Code, Librarymlcroissant, Librarydatasets, Librarypandas, Licensecc By 40, Text, GRPO, Parquet, Regionus, Reinforcement Learning, Science, Agent, Math

Description

Open-AgentRL GRPO 2K is a compact dataset containing approximately 2,000 samples for GRPO training. It was created by y-ohtani and last updated on February 28, 2026. The dataset is constructed by balanced sampling from five sources: DeepScaleR-Preview (374 math items), NuminaMath-1.5 (359 math items), Omni-MATH (366 math items), GPQA Diamond (198 science items), and LeetCodeDataset (351 code items).

Use Cases

Training GRPO agents for mathematical problem-solving based on the math domain content.
Fine-tuning language models on scientific Q&A for physics, chemistry, and biology based on the GPQA science subset.
Developing code generation agents using programming problems based on the LeetCode code samples.
Benchmarking agent performance across multiple reasoning domains based on the balanced sampling from five distinct sources.

Strengths

Contains a balanced mix of approximately 2,000 samples from five distinct data sources.
Includes 374 math problems from DeepScaleR-Preview, 359 from NuminaMath-1.5, and 366 from Omni-MATH.
Adds 198 science questions from the GPQA Diamond benchmark covering physics, chemistry, and biology.
Incorporates 351 programming problems from a LeetCode dataset.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is described as 'compact'; its scale may be limited for large-scale model training.

Provenance

Source: Aggregated from five datasets on Hugging Face: agentica-org/DeepScaleR-Preview, AI-MO/NuminaMath-1.5, KbsdJames/Omni-MATH, Idavidrein/gpqa (gpqa_diamond), and newfacade/LeetCodeDataset.
Collection Method: Created by balanced, equal sampling from each of the five source datasets.
Time Range: null
Freshness: Last updated 2026-02-28 10:18:27; freshness should be verified.
Geography: null

Licenses vary by source subset (MIT, Apache 2.0, CC-BY-4.0); users must comply with the respective license for each component.

Text OPTIMIZED-PARQUET Parquet Size Categories1 Kn10 K Librarypolars Languageen Arxiv231112022 Modalitytext Code Librarymlcroissant Librarydatasets Librarypandas Licensecc By 40 GRPO Regionus Reinforcement Learning Science Agent Math

Open Agentrl Grpo 2K: A Compact Dataset for GRPO Training Across Math, Science, and Code

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info