Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Encompassing 30,000 images from the GQA dataset, intended for training Visual Question Answering models. It is tagged for scene understanding and computer vision tasks, with associated English text.
The specific format of the image-text data (e.g., pairing method, question-answer structure) is not detailed in the input.