Sign in to view source links and access this dataset
Description
UltraMix is a lean, high-quality preference optimization dataset curated from five open-source DPO corpora. It was created by aladinDJ using the Magpie Annotation Framework and a reward-driven curation pipeline, and was last updated on Hugging Face in February 2026. The dataset removes noisy, low-reward, or redundant preference pairs while preserving task balance.
Use Cases
Fine-tuning language models for instruction following based on the curated preference pairs.
Training reward models based on the reward-aligned and quality-filtered data.
Benchmarking preference optimization methods based on the balanced task mixture.
Studying the impact of data curation on alignment training based on the described pipeline.
Strengths
Curated from five established open-source DPO corpora: TuluDPO, ORPO, UltraFeedback, HelpSteer, and Code-Preference-Pairs.
Processed with a reward-driven curation pipeline to remove noisy, low-reward, or redundant pairs.
Designed to preserve task balance in areas like instruction following and reasoning.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Hugging Face, author aladinDJ
Collection Method
Curated from five open-source DPO corpora using the Magpie Annotation Framework and a reward-driven pipeline.
Time Range
null
Freshness
Last updated 2026-02-28 16:41:27; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before application.