Name: Python Code DPO Fine-Tune: 2,000 Preference Pairs for LLM Alignment
Creator: quangduc1112001
Published: 2024-10-12T21:06:07
Keywords: Size Categories1 Kn10 K, Task Categoriestext Generation, Librarypolars, Task Categoriesquestion Answering, Languageen, Text Generation, Modalitytext, Code, Librarymlcroissant, Direct Preference Optimization, Librarydatasets, Librarypandas, Text, Parquet, Python Code, Regionus, Llm Fine Tuning, Instruction Tuning, Licensemit

Description

2,000 rows of preference data for Direct Preference Optimization (DPO) fine-tuning, structured with prompt, chosen, and rejected fields. The chosen responses and prompts are sourced from the iamtarun/python_code_instructions_18k_alpaca dataset, while rejected responses are generated by a base LLAMA 3.1 model. The dataset was uploaded by quangduc1112001 to Hugging Face and last updated on November 4, 2024.

Use Cases

Fine-tuning language models for code generation based on human preference data.
Training reward models for Reinforcement Learning from Human Feedback (RLHF) based on the prompt-chosen-rejected triplet structure.
Benchmarking DPO and other alignment algorithms on a Python-specific instruction dataset.
Studying the quality gap between model-generated and human-preferred code completions.

Strengths

Contains 2,000 explicitly defined preference pairs for model alignment.
Pairs human-preferred code (chosen) with model-generated alternatives (rejected) for direct comparison.
Prompts and chosen responses are sourced from a known, structured Python instruction dataset.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but file formats, size, and license details are unspecified.
The method for generating rejected responses is only partially described, which may affect reproducibility.

Provenance

Source: Hugging Face dataset uploaded by quangduc1112001.
Collection Method: Prompts and chosen responses sourced from iamtarun/python_code_instructions_18k_alpaca; rejected responses generated by inference from a base LLAMA 3.1 model.
Time Range: null
Freshness: Last updated 2024-11-04 09:06:06; freshness should be verified.
Geography: null

License is unknown; users should verify permissions before use. The full description is available on the Hugging Face dataset page.

Text Parquet Size Categories1 Kn10 K Task Categoriestext Generation Librarypolars Task Categoriesquestion Answering Languageen Text Generation Modalitytext Code Librarymlcroissant Direct Preference Optimization Librarydatasets Librarypandas Python Code Regionus Llm Fine Tuning Instruction Tuning Licensemit

Python Code DPO Fine-Tune: 2,000 Preference Pairs for LLM Alignment

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info