Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
10,000 entries support training and evaluating Multimodal Large Language Models on visual instruction following. The dataset is structured in a messages format with user instructions and assistant responses, referencing images from sources like LLaVA-Instruct and Visual Genome. It was created by KerenStone for research published in the paper 'Empowering Reliable Visual-Centric Instruction Following in MLLMs'.
Users must visit the Hugging Face dataset page for the full description, detailed format specifications (including DPO), and access to the actual data files. License information is tagged but not explicitly stated in the provided input.