Name: Visual-Centric Instruction Following Dataset for MLLM Training
Creator: KerenStone
Published: 2026-03-23T16:42:01
Keywords: Languageen, Arxiv260103198, Regionus, Licensemit

Description

10,000 entries support training and evaluating Multimodal Large Language Models on visual instruction following. The dataset is structured in a messages format with user instructions and assistant responses, referencing images from sources like LLaVA-Instruct and Visual Genome. It was created by KerenStone for research published in the paper 'Empowering Reliable Visual-Centric Instruction Following in MLLMs'.

Use Cases

Supervised Fine-Tuning (SFT) of MLLMs using the 'messages' format containing user 'instruction' and assistant 'response' pairs.
Direct Preference Optimization (DPO) training as indicated in the dataset's format description for aligning model outputs.
Benchmarking model reliability on visual-centric tasks using the structured 'content' and linked 'images' fields.

Strengths

Contains 10,000 entries specifically designed for visual instruction following tasks.
Sourced from established image datasets including LLaVA-Instruct, Visual Genome, and ALLaVA-4V.
Provides a defined structure with 'messages' and 'images' fields for consistent model training.

Limitations

Specific column definitions, sample data, and file formats are not provided in the input.
The total size and detailed composition of the image corpus are unknown.
Potential bias from the original image sources (LLaVA-Instruct, Visual Genome) is not characterized.

Provenance

Source: Official repository (KerenWLHe/VC-IFEval) and Hugging Face dataset page by KerenStone.
Collection Method: Compiled from image sources including LLaVA-Instruct, Visual Genome, and ALLaVA-4V for MLLM research.
Freshness: Last updated on March 23, 2026.

Users must visit the Hugging Face dataset page for the full description, detailed format specifications (including DPO), and access to the actual data files. License information is tagged but not explicitly stated in the provided input.

Languageen Arxiv260103198 Regionus Licensemit

Visual-Centric Instruction Following Dataset for MLLM Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info