Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A dataset designed to train Generative Reward Models (GenRMs) using reinforcement learning at scale. It was created by NVIDIA and last updated on March 11, 2026. The data is composed of preference data from diverse domains and a synthetic safety blend, structured with a 'meta-prompt' format.
License is unknown; users must verify terms before use.