Sign in to view source links and access this dataset
Description
mlx-community provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. It contains 1,000 total examples, split into 800 for training, 100 for validation, and 100 for testing. The dataset was last updated on May 27, 2025.
Use Cases
Fine-tuning language models based on human-like preference examples.
Evaluating DPO model performance on a smaller, controlled test set.
Benchmarking alignment techniques using the provided train/validation/test splits.
Strengths
Contains 1,000 total examples with defined splits.
Provides a dedicated test set of 100 examples for evaluation.
Derived from a known source dataset (Human-Like DPO Dataset by HumanLLMs).
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
Derived from the Human-Like DPO Dataset by HumanLLMs.
Freshness
Last updated 2025-05-27 18:54:48; freshness should be verified.
License is unknown; users should verify terms before use.