Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
NVIDIA's Nemotron-3-Ultra post-training recipe uses these Reinforcement Learning and Multi-teacher On-Policy Distillation training-data blends. Each prompt is paired with an agent or environment that returns a verifiable or judge-based reward, as consumed by the NeMo Gym agent framework. The dataset was last updated on June 4, 2026.
The full description is hosted externally; users must visit the provided URL for complete documentation.