Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
2,000 rows of preference data for Direct Preference Optimization (DPO) fine-tuning, structured with prompt, chosen, and rejected fields. The chosen responses and prompts are sourced from the iamtarun/python_code_instructions_18k_alpaca dataset, while rejected responses are generated by a base LLAMA 3.1 model. The dataset was uploaded by quangduc1112001 to Hugging Face and last updated on November 4, 2024.
License is unknown; users should verify permissions before use. The full description is available on the Hugging Face dataset page.