24,903 visual question-answering pairs paired with images from the COCO dataset, categorized into multiple-choice and direct-answer formats. Each entry includes human-annotated rationales explaining the reasoning required to answer questions that necessitate external knowledge beyond the visual content.
Use Cases
- Develop explainable AI models by training on the 'rationales' field to justify visual reasoning steps
- Benchmark visual question answering performance using the 'multiple_choice' and 'direct_answer' ground truth labels
- Train multi-modal transformers to integrate external knowledge by processing the 'question' and 'image' inputs alongside knowledge retrieval systems
Strengths
- 24,903 unique questions split into training, validation, and test sets
- Includes 'rationales' column providing natural language explanations for the correct answers
- Features two distinct evaluation formats: 'multiple_choice' with four options and 'direct_answer' for open-ended response
- Questions are mapped to 'image_id' from the COCO 2017 dataset