Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A 2024 mixture of text preference datasets used to train the weqweasdas/RM-Mistral-7B reward model for Reinforcement Learning from Human Feedback. The dataset was created by OpenRLHF and includes multiple sources of human-annotated comparisons. It is designed for training models to score and rank text outputs based on human preferences.
The dataset page references an external Notion page and GitHub repository for full training details and data mixture specifics, which must be consulted for complete understanding. License information is not provided in the input.