Name: Nemotron RL Ultra Training Blends: Reinforcement Learning Data for Post-Training
Creator: nvidia
Published: 2026-06-02T08:59:12
Keywords: Training Data, Multi Teacher Distillation, Prompt Reward, Text, Ai Training, Reinforcement Learning

Description

NVIDIA's Nemotron-3-Ultra post-training recipe uses these Reinforcement Learning and Multi-teacher On-Policy Distillation training-data blends. Each prompt is paired with an agent or environment that returns a verifiable or judge-based reward, as consumed by the NeMo Gym agent framework. The dataset was last updated on June 4, 2026.

Use Cases

Training or fine-tuning large language models via reinforcement learning based on the described reward mechanisms.
Implementing Multi-teacher On-Policy Distillation (MOPD) using the provided training-data blends.
Benchmarking RL training frameworks like NeMo Gym with the described prompt-reward pairings.
Studying the effect of different reward signals (verifiable vs. judge-based) on model alignment.

Strengths

Created by NVIDIA, a leading AI research institution, for a public model recipe.
Designed for a specific, documented training framework (NeMo RL recipes and NeMo Gym).
Last updated on June 4, 2026, suggesting recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and file size are unknown, which may limit suitability assessment.
The description is partial, referencing an external page for full details.

Provenance

Source: nvidia
Collection Method: Likely generated as part of the Nemotron-3-Ultra model development and post-training process.
Freshness: Last updated 2026-06-04 11:34:41

The full description is hosted externally; users must visit the provided URL for complete documentation.

Text Training Data Multi Teacher Distillation Prompt Reward Ai Training Reinforcement Learning

Nemotron RL Ultra Training Blends: Reinforcement Learning Data for Post-Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info