DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Dolci Instruct DPO: 260,000 Preference Pairs for Olmo 3 Tuning | DataSalon

Home EducationDolci Instruct DPO: 260,000 Preference Pairs for Olmo 3 Tuning

Education

Dolci Instruct DPO: 260,000 Preference Pairs for Olmo 3 Tuning

Name: Dolci Instruct DPO: 260,000 Preference Pairs for Olmo 3 Tuning
Creator: allenai
Published: 2025-10-22T00:10:27
Keywords: Librarypolars, Librarydask, Arxiv250706187, Modalitytext, Size Categories100 Kn1 M, Librarymlcroissant, Librarydatasets, Parquet, Arxiv251213961, Regionus, Licenseodc By

by allenai·Updated 4mo ago

Available on 1 platform

Description

260,000 preference pairs for Direct Preference Optimization (DPO) developed by the Allen Institute for AI in 2025-2026. This mixture was utilized to preference tune the Olmo 3 Instruct 7B model using delta-aware heuristics and GPT-judge pipelines.

Use Cases

Preference tuning of large language models using the 260,000 preference pairs
Evaluating model alignment using the delta-aware preference heuristics
Researching the impact of Delta Learning on instruction following

Strengths

260,000 preference pairs
Used to train Olmo 3 Instruct 7B
Includes delta-aware Ultrafeedback-esque GPT-judge annotations

Limitations

Synthetic nature of GPT-judge annotations may introduce model-specific biases
Heuristic-based pairs might lack the nuance of human-annotated preferences

Provenance

Source: Allen Institute for AI
Collection Method: Synthetic generation using Delta Learning heuristics and GPT-judge pipelines
Freshness: Last updated February 2026.

Licensed under ODC-BY; users must follow AI2's Responsible Use Guidelines.

Parquet Librarypolars Librarydask Arxiv250706187 Modalitytext Size Categories100 Kn1 M Librarymlcroissant Librarydatasets Arxiv251213961 Regionus Licenseodc By

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

3.6K downloads

10 likes

0 views

Dataset Info

Author: allenai
Created: Oct 22, 2025
Updated: Feb 20, 2026
Last synced: Jun 8, 2026

Access

Community

3.6K downloads

10 likes

0 views

Dataset Info

Author: allenai
Created: Oct 22, 2025
Updated: Feb 20, 2026
Last synced: Jun 8, 2026

Dolci Instruct DPO: 260,000 Preference Pairs for Olmo 3 Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info