Name: Multimodal Feedback Data for Reinforcement Learning from Human Feedback
Creator: openbmb
Published: 2023-12-30T11:35:38
Keywords: Size Categories1 Kn10 K, Task Categoriestext Generation, Librarypolars, Languageen, Task Categoriesvisual Question Answering, Modalitytext, Librarymlcroissant, Modalityimage, Librarydatasets, Librarypandas, Parquet, Licensecc By Nc 40, Regionus, Arxiv240517220

Description

RLHF-V-Dataset is a large-scale multimodal feedback dataset constructed using open-source models for reinforcement learning. It was released by the openbmb organization in May 2024 and has been utilized in models like MiniCPM-V 2.0. The dataset is designed for diverse tasks involving computer vision and large language models.

Use Cases

Train multimodal large language models using feedback data for tasks like image captioning or visual question answering.
Fine-tune reinforcement learning algorithms on human or AI-generated preference data for alignment tasks.
Benchmark the performance of open-source vision-language models against the feedback annotations provided.

Strengths

Dataset is described as 'large-scale' and designed for 'diverse-task' multimodal applications.
Associated with published research, including a paper accessible on arXiv as of May 2024.
Has been used in production models, specifically cited in the development of MiniCPM-V 2.0.

Limitations

Specific structural details like row count, column names, and file formats are unknown.
The dataset's construction method using open-source models may introduce biases or noise inherent to those models.
Limited information is available on the temporal or geographic coverage of the underlying data.

Provenance

Source: openbmb organization on Hugging Face.
Collection Method: Constructed using open-source models, likely for reinforcement learning from human or AI feedback (RLHF/RLAIF).
Freshness: Last updated on 2024-05-28, with a new related dataset (RLAIF-V-Dataset) released in May 2024.

The dataset page mentions a related 'RLAIF-V-Dataset' release; users should verify which dataset suits their needs. The full description is hosted externally on Hugging Face, requiring a visit for complete details.

Parquet Size Categories1 Kn10 K Task Categoriestext Generation Librarypolars Languageen Task Categoriesvisual Question Answering Modalitytext Librarymlcroissant Modalityimage Librarydatasets Librarypandas Licensecc By Nc 40 Regionus Arxiv240517220

Multimodal Feedback Data for Reinforcement Learning from Human Feedback

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info