Name: RLAIF-V: 10K-100K Multimodal Preference Alignment Records
Creator: openbmb
Published: 2024-05-19T15:34:55
Keywords: Size Categories10 Kn100 K, Task Categoriesimage Text To Text, Languageen, Task Categoriesvisual Question Answering, Arxiv250918154, Arxiv231200849, Mllm, Licensecc By Nc 40, Regionus, Feedback, Arxiv240517220, Task Categoriesany To Any, Preference Alignment, Multimodal

Description

RLAIF-V provides between 10,000 and 100,000 multimodal preference-alignment records developed by OpenBMB to improve Multimodal Large Language Model (MLLM) trustworthiness. The data utilizes AI-generated feedback to refine model responses, serving as a core training component for the MiniCPM-V 4.5 model released in 2024.

Use Cases

Preference-alignment training for MLLMs using the feedback and preference labels
Hallucination reduction in visual question answering tasks by training on trustworthiness-oriented feedback
Developing reward models for multimodal RLHF/RLAIF pipelines using the image-text-feedback triplets

Strengths

Scale of 10,000 to 100,000 multimodal records
Methodology peer-reviewed and accepted by CVPR 2025
Proven utility in training GPT-4o level models like MiniCPM-V 4.5

Limitations

Restricted to non-commercial use via CC BY-NC 4.0 license
Potential for synthetic biases inherent in AI-generated feedback (RLAIF) compared to human annotation

Provenance

Source: OpenBMB and the RLAIF-V research team
Collection Method: Reinforcement Learning from AI Feedback (RLAIF)
Freshness: Last updated October 2025; reflects state-of-the-art RLAIF techniques as of CVPR 2025.

This data is released under the CC BY-NC 4.0 license, which prohibits commercial redistribution or use. It is specifically optimized for MLLM trustworthiness and preference alignment.

Multimodal Size Categories10 Kn100 K Task Categoriesimage Text To Text Languageen Task Categoriesvisual Question Answering Arxiv250918154 Arxiv231200849 Mllm Licensecc By Nc 40 Regionus Feedback Arxiv240517220 Task Categoriesany To Any Preference Alignment

RLAIF-V: 10K-100K Multimodal Preference Alignment Records

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info