Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
20,000 preference pairs for Direct Preference Optimization (DPO) training, sourced from four established Hugging Face datasets. The collection includes 10,000 Chinese and 10,000 English examples, each filtered by quality scores. Author llamafactory uploaded this multilingual mix on June 7, 2024.
License is unknown; users should verify licensing for the four source datasets before use.