DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Human-Like DPO: 1,000 Preference Examples for Language Model Training | DataSalon

Home Mathematics & StatisticsHuman-Like DPO: 1,000 Preference Examples for Language Model Training

Mathematics & Statistics

Human-Like DPO: 1,000 Preference Examples for Language Model Training

Name: Human-Like DPO: 1,000 Preference Examples for Language Model Training
Creator: mlx-community
Published: 2025-01-22T15:49:46
Keywords: Ai Safety, Training Data, Benchmark, Text, Language Model, Preference Optimization

by mlx-community·Updated 1y ago

Available on 1 platform

Description

mlx-community provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. It contains 1,000 total examples, split into 800 for training, 100 for validation, and 100 for testing. The dataset was last updated on May 27, 2025.

Use Cases

Fine-tuning language models based on human-like preference examples.
Evaluating DPO model performance on a smaller, controlled test set.
Benchmarking alignment techniques using the provided train/validation/test splits.

Strengths

Contains 1,000 total examples with defined splits.
Provides a dedicated test set of 100 examples for evaluation.
Derived from a known source dataset (Human-Like DPO Dataset by HumanLLMs).

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: Derived from the Human-Like DPO Dataset by HumanLLMs.
Freshness: Last updated 2025-05-27 18:54:48; freshness should be verified.

License is unknown; users should verify terms before use.

Text Ai Safety Training Data Benchmark Language Model Preference Optimization

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

165 downloads

5 likes

0 views

Dataset Info

Author: mlx-community
Created: Jan 22, 2025
Updated: May 27, 2025
Last synced: Jun 9, 2026

Access

Community

165 downloads

5 likes

0 views

Dataset Info

Author: mlx-community
Created: Jan 22, 2025
Updated: May 27, 2025
Last synced: Jun 9, 2026

Human-Like DPO: 1,000 Preference Examples for Language Model Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info