Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
260,000 preference pairs for Direct Preference Optimization (DPO) developed by the Allen Institute for AI in 2025-2026. This mixture was utilized to preference tune the Olmo 3 Instruct 7B model using delta-aware heuristics and GPT-judge pipelines.
Licensed under ODC-BY; users must follow AI2's Responsible Use Guidelines.