22 million compositional questions and 113,000 images featuring dense scene graph annotations. The dataset structures visual reasoning through functional programs that map out the logic required to reach an answer for each image.
Use Cases
- Train neural module networks using the functional program logic to guide visual attention and execution
- Improve visual grounding by mapping scene graph nodes to specific bounding boxes in the image
- Evaluate model bias by comparing performance on the balanced versus unbalanced question splits
- Benchmark compositional generalization by testing on question structures not seen during training
Strengths
- 22 million questions generated to test compositional reasoning across 113,000 images
- Includes dense scene graphs containing objects, attributes, and relations for every image
- Provides functional programs for every question to define explicit reasoning steps
- Features a balanced answer distribution to mitigate linguistic priors and shortcuts