Sign in to view source links and access this dataset
Description
SpaceDG-Bench is a human-verified benchmark containing 1,102 questions designed to evaluate the spatial intelligence of Multimodal Large Language Models (MLLMs) under visual degradation. The dataset spans 11 reasoning categories and 9 visual degradation types, yielding over 10,000 visual question answering (VQA) instances. It was created by author xlzhou126 and last updated on May 24, 2026.
Use Cases
Benchmarking MLLM spatial reasoning capabilities based on the 11 reasoning categories
Testing model robustness to visual artifacts based on the 9 degradation types like motion blur and low light
Evaluating visual question answering performance under adverse conditions based on the over 10,000 VQA instances
Strengths
Contains 1,102 questions, providing a substantial test set
Covers 9 distinct visual degradation types, including motion blur and adverse weather
Yields over 10,000 VQA instances from the question and degradation combinations
Is described as human-verified, suggesting a level of quality control
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download
Provenance
Source
xlzhou126 on Hugging Face, part of the SpaceDG project
Collection Method
Likely constructed for benchmark evaluation; described as human-verified
Freshness
Last updated 2026-05-24 16:01:15; freshness should be verified
License is unknown; terms of use must be verified before application.