Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
STVQA-7K is a high-quality spatial visual question answering dataset comprising 7,587 samples. It was created by hunarbatra and last updated on 2026-01-29. The dataset is fully grounded in human-annotated scene graphs from Visual Genome and is designed for training and evaluating spatial reasoning capabilities in multimodal large language models.
License is unknown, which may restrict usage.