22 million compositional questions and 113,000 images featuring scene graphs. Structured semantic representations for both images and questions support multi-step visual reasoning and logic-based evaluation.
Use Cases
- Train visual reasoning models using the question and answer fields
- Utilize scene_graph annotations to improve object-relation understanding in VQA models
- Validate reasoning consistency using the semantic field to trace model logic
- Perform zero-shot reasoning tests by filtering questions based on the entailed and equivalent logic labels
Strengths
- 22 million questions generated from structured scene graphs
- 113,000 images annotated with object bounding boxes and relational attributes
- Includes functional programs for each question to define the reasoning steps
- Features 1.7 million relational links between objects in the scene graphs