Name: Human Preference Pairs For Reinforcement Learning Reward Models
Creator: OpenRLHF
Published: 2024-06-14T11:22:36
Keywords: Librarypolars, Rlhf, Librarydask, Modalitytext, Reward Modeling, Size Categories100 Kn1 M, Modalitytabular, Librarymlcroissant, Text Pairs, Librarydatasets, Preference Data, Text, Parquet, Regionus, Human Feedback

Description

A 2024 mixture of text preference datasets used to train the weqweasdas/RM-Mistral-7B reward model for Reinforcement Learning from Human Feedback. The dataset was created by OpenRLHF and includes multiple sources of human-annotated comparisons. It is designed for training models to score and rank text outputs based on human preferences.

Use Cases

Train a reward model to score chosen_text over rejected_text pairs for RLHF alignment.
Fine-tune a classifier to predict human preference labels from text prompt and response pairs.
Benchmark reward modeling techniques on a mixed dataset containing multiple annotation sources.
Analyze the distribution of human preferences across different prompt categories within the mixture.

Strengths

Dataset mixture is validated by training a published 7B parameter reward model.
Includes multiple established preference data sources to improve generalization.
Data is formatted for direct use with standard RLHF training scripts.

Limitations

Specific row counts, column names, and sample sizes for the mixture are not disclosed.
The exact composition and proportions of the source datasets are not detailed in the provided description.
Potential label noise or bias inherent in the original human annotation processes of the source data.

Provenance

Source: Hugging Face dataset created by OpenRLHF, aggregating multiple existing preference datasets.
Collection Method: Mixture of pre-existing human-annotated text preference datasets, curated for reward model training.
Time Range: null
Freshness: Last updated on Hugging Face in June 2024.
Geography: null

The dataset page references an external Notion page and GitHub repository for full training details and data mixture specifics, which must be consulted for complete understanding. License information is not provided in the input.

Text Parquet Librarypolars Rlhf Librarydask Modalitytext Reward Modeling Size Categories100 Kn1 M Modalitytabular Librarymlcroissant Text Pairs Librarydatasets Preference Data Regionus Human Feedback

Human Preference Pairs For Reinforcement Learning Reward Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info