Name: UltraMix: A Reward-Aligned, Quality-Filtered DPO Mixture
Creator: aladinDJ
Published: 2025-11-14T04:41:22
Keywords: Text Generation, Text, Reasoning, Preference Optimization, Instruction Following

Description

UltraMix is a lean, high-quality preference optimization dataset curated from five open-source DPO corpora. It was created by aladinDJ using the Magpie Annotation Framework and a reward-driven curation pipeline, and was last updated on Hugging Face in February 2026. The dataset removes noisy, low-reward, or redundant preference pairs while preserving task balance.

Use Cases

Fine-tuning language models for instruction following based on the curated preference pairs.
Training reward models based on the reward-aligned and quality-filtered data.
Benchmarking preference optimization methods based on the balanced task mixture.
Studying the impact of data curation on alignment training based on the described pipeline.

Strengths

Curated from five established open-source DPO corpora: TuluDPO, ORPO, UltraFeedback, HelpSteer, and Code-Preference-Pairs.
Processed with a reward-driven curation pipeline to remove noisy, low-reward, or redundant pairs.
Designed to preserve task balance in areas like instruction following and reasoning.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Hugging Face, author aladinDJ
Collection Method: Curated from five open-source DPO corpora using the Magpie Annotation Framework and a reward-driven pipeline.
Time Range: null
Freshness: Last updated 2026-02-28 16:41:27; freshness should be verified.
Geography: null

License is unknown; terms of use must be verified before application.

Text Text Generation Reasoning Preference Optimization Instruction Following

UltraMix: A Reward-Aligned, Quality-Filtered DPO Mixture

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info