Name: Visual-Centric Instruction Following Dataset For MLLM Training
Creator: WoofWoof
Published: 2026-03-24T03:09:56
Keywords: Languageen, Arxiv260103198, Regionus, Licensemit

Description

VCIF-10K provides data for training Multimodal Large Language Models on visual instruction following tasks. The dataset is structured in a messages format with user instructions and assistant responses, referencing images from sources like LLaVA-Instruct and Visual Genome. It was created by WoofWoof and supports both Supervised Fine-Tuning and Direct Preference Optimization training paradigms.

Use Cases

Fine-tune MLLMs using the 'messages' format with 'user' and 'assistant' roles for visual instruction following.
Implement Direct Preference Optimization (DPO) training using the dataset's structured prompt-response pairs.
Train models to generate 'assistant' role 'content' based on multimodal 'user' inputs containing image references and text instructions.

Strengths

Dataset is associated with a published research paper titled 'Empowering Reliable Visual-Centric Instruction Following in MLLMs'.
Images are sourced from established multimodal datasets including LLaVA-Instruct, Visual Genome, and ALLaVA-4V.
Data format is explicitly defined for SFT and DPO training methodologies.

Limitations

The exact number of rows, columns, and total dataset size are unknown.
Sample data and specific file formats are not provided for inspection.
Potential label noise or biases may exist as the image sources are aggregated from other datasets.

Provenance

Source: Author WoofWoof on Hugging Face; images sourced from LLaVA-Instruct, Visual Genome, and ALLaVA-4V.
Collection Method: Created for the VC-IFEngine project, structured for SFT and DPO training of MLLMs.
Freshness: Last updated on 2026-03-24.

License information is tagged as 'mit' but not explicitly confirmed; users should verify on the official Hugging Face dataset page. The full description and data access details are available only on the external dataset page.

Languageen Arxiv260103198 Regionus Licensemit

Visual-Centric Instruction Following Dataset For MLLM Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info