Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A collection of 45,879 training samples for instruction-following reinforcement learning (RL). It was curated by NVIDIA to train the Nemotron-Cascade-2-30B-A3B model and includes multi-domain RL, on-policy distillation, and software engineering RL data.
Dataset description notes it is ready for commercial use, but specific license details are unknown. Full description is available on the Hugging Face dataset page.