Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
NVIDIA released this collection of dataset blends in March 2026 to document the specific data mixtures used for Reinforcement Learning (RL) training of the Nemotron-3-Super-120B-A12B model. The data is organized into six distinct training stages including Reinforcement Learning from Verifiable Rewards (RLVR), Software Engineering (SWE), and Reinforcement Learning from Human Feedback (RLHF).
Users should refer to the full description on the Hugging Face page for specific mixing percentages; the dataset is licensed under CC BY 4.0.