205k high-quality samples for aligning Multimodal Large Language Models with human preferences. The dataset was created by PhoenixZ and is associated with the paper 'OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference'. It was last updated on March 1, 2025.
Use Cases
- Fine-tuning MLLMs for better human preference alignment based on the described high-quality samples.
- Training Direct Preference Optimization (DPO) models using the companion DPO dataset.
- Benchmarking MLLM performance on alignment tasks using the referenced MM-AlignBench.
- Developing new alignment techniques for vision-language models based on the multimodal training data.
Strengths
- Contains 205k high-quality samples.
- Is the official dataset for a published research paper.
- Provides companion resources including a DPO dataset and evaluation benchmarks.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- PhoenixZ
- Freshness
- Last updated 2025-03-01 09:21:45; freshness should be verified.