OmniAlign-V-DPO datasets contains 150,000 high-quality positive-negative pairs for Direct Preference Optimization (DPO). It is based on the OmniAlign-V datasets and was created by PhoenixZ. The dataset was last updated on March 1, 2025.
Use Cases
- Training multimodal LLMs via Direct Preference Optimization based on the 150k preference pairs.
- Benchmarking model alignment performance using the referenced MM-AlignBench.
- Fine-tuning vision-language models like LLaVANext-OA variants with human preference data.
- Researching methods for enhancing multimodal model alignment.
Strengths
- Contains 150,000 high-quality positive-negative pairs, providing a substantial training resource.
- Is the official dataset from a referenced research paper and GitHub repository.
- Specifically designed for Direct Preference Optimization (DPO), a targeted training method.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is known (150k pairs), but specific file formats and data structure details are unknown.
- Freshness should be verified; the last update was March 1, 2025.
Provenance
- Source
- PhoenixZ
- Collection Method
- Derived from the OmniAlign-V datasets.
- Freshness
- Last updated 2025-03-01 09:22:05.