Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VCIF-10K provides data for training Multimodal Large Language Models on visual instruction following tasks. The dataset is structured in a messages format with user instructions and assistant responses, referencing images from sources like LLaVA-Instruct and Visual Genome. It was created by WoofWoof and supports both Supervised Fine-Tuning and Direct Preference Optimization training paradigms.
License information is tagged as 'mit' but not explicitly confirmed; users should verify on the official Hugging Face dataset page. The full description and data access details are available only on the external dataset page.