Nemotron RLHF GenRM v1: Preference Data for Training Generative Reward Models

Name: Nemotron RLHF GenRM v1: Preference Data for Training Generative Reward Models
Creator: nvidia
Published: 2026-03-08T18:44:45
Keywords: Reward Modeling, Benchmark, Synthetic Safety, Preference Data, Text, Reinforcement Learning, Llm Training, Synthetic

by nvidiaUpdated 4mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A dataset designed to train Generative Reward Models (GenRMs) using reinforcement learning at scale. It was created by NVIDIA and last updated on March 11, 2026. The data is composed of preference data from diverse domains and a synthetic safety blend, structured with a 'meta-prompt' format.

Use Cases

Training Generative Reward Models (GenRMs) based on the described preference data structure.
Improving model generalization and reducing reward hacking based on the dataset's stated design goals.
Benchmarking reward models against traditional Bradley-Terry models using the provided synthetic safety blend.

Strengths

Designed by NVIDIA, a major AI research organization.
Explicitly aims to train models that generalize better than traditional Bradley-Terry models.
Includes a synthetic safety blend to address specific risks.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license information are unknown, which may limit suitability assessment.
Data may reflect bias inherent to the unspecified sources of the preference data.

Provenance

Source: NVIDIA
Collection Method: Leverages reinforcement learning at scale; composed of preference data and a synthetic safety blend.
Time Range: null
Freshness: Last updated 2026-03-11 00:22:00; freshness should be verified.
Geography: null

License is unknown; users must verify terms before use.

Text Reward Modeling Benchmark Synthetic Safety Preference Data Reinforcement Learning Llm Training Synthetic

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

43

Access

26

Community

145 downloads

1 likes

0 views

Dataset Info

Author: nvidia
Created: Mar 8, 2026
Updated: Mar 11, 2026
Last synced: Jul 22, 2026

Access

26

Community

145 downloads

1 likes

0 views

Dataset Info

Author: nvidia
Created: Mar 8, 2026
Updated: Mar 11, 2026
Last synced: Jul 22, 2026

Nemotron RLHF GenRM v1: Preference Data for Training Generative Reward Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info