Name: DPO En Zh 20K: 20,000 Preference Pairs for Direct Preference Optimization
Creator: llamafactory
Published: 2024-04-19T17:11:52
Keywords: Size Categories10 Kn100 K, Task Categoriestext Generation, Doi1057967hf3629, Orpo, Librarypolars, Languagezh, Rlhf, Languageen, Text Generation, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Preference Data, Llama Factory, Text, Multilingual, Regionus, JSON, Dpo, Licenseapache 20

Description

20,000 preference pairs for Direct Preference Optimization (DPO) training, sourced from four established Hugging Face datasets. The collection includes 10,000 Chinese and 10,000 English examples, each filtered by quality scores. Author llamafactory uploaded this multilingual mix on June 7, 2024.

Use Cases

Fine-tuning language models for preference alignment based on scored human feedback examples.
Training multilingual models using a mix of Chinese and English preference data.
Benchmarking DPO training pipelines on a combined dataset from multiple sources.
Studying the effect of quality score thresholds on DPO training outcomes.

Strengths

Combines 20,000 examples from four distinct, named source datasets.
Includes 10,000 Chinese and 10,000 English examples for multilingual training.
Examples are filtered by quality scores (e.g., chosen score>=4 or >=8).
Specifically formatted for use with LLaMA Factory training pipelines.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for the final combined dataset is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the original Hugging Face collections.

Provenance

Source: Combined from four Hugging Face datasets: argilla/distilabel-capybara-dpo-7k-binarized, argilla/distilabel-intel-orca-dpo-pairs, argilla/ultrafeedback-binarized-preferences-cleaned, and wenbopan/Chinese-dpo-pairs.
Collection Method: Examples were selected based on quality score thresholds from the source datasets.
Time Range: null
Freshness: Last updated 2024-06-07 18:44:17; freshness should be verified.
Geography: null

License is unknown; users should verify licensing for the four source datasets before use.

DPO En Zh 20K: 20,000 Preference Pairs for Direct Preference Optimization

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info