Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Open-AgentRL GRPO 2K is a compact dataset containing approximately 2,000 samples for GRPO training. It was created by y-ohtani and last updated on February 28, 2026. The dataset is constructed by balanced sampling from five sources: DeepScaleR-Preview (374 math items), NuminaMath-1.5 (359 math items), Omni-MATH (366 math items), GPQA Diamond (198 science items), and LeetCodeDataset (351 code items).
Licenses vary by source subset (MIT, Apache 2.0, CC-BY-4.0); users must comply with the respective license for each component.