Atsunori converted the NVIDIA HelpSteer2 dataset into preference pairs for training Direct Preference Optimization models. The conversion is based on the helpfulness score of responses, with the higher-scoring response designated as the chosen one. The dataset was last updated on July 11, 2024.
Use Cases
- Training Direct Preference Optimization models based on helpfulness scores.
- Fine-tuning language models for improved helpfulness based on preference pairs.
- Benchmarking reward models on a preference dataset derived from HelpSteer2.
Strengths
- Derived from the established NVIDIA HelpSteer2 dataset.
- Specifically formatted for Direct Preference Optimization training.
- Licensed under CC-BY-4.0, allowing for open use.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Freshness should be verified as the last update was July 11, 2024.
Provenance
- Source
- Derived from nvidia/HelpSteer2 dataset.
- Collection Method
- Converted into preference pairs based on helpfulness scores.
- Freshness
- Last updated 2024-07-11 03:09:27