Name: SpaceDG-Bench: 1,102 Questions for MLLM Spatial Intelligence Under Visual Degradation
Creator: xlzhou126
Published: 2026-05-09T08:48:44
Keywords: Spatial Intelligence, Visual Degradation, Multimodal Llm, Benchmark, Vqa Benchmark, Multimodal

Description

SpaceDG-Bench is a human-verified benchmark containing 1,102 questions designed to evaluate the spatial intelligence of Multimodal Large Language Models (MLLMs) under visual degradation. The dataset spans 11 reasoning categories and 9 visual degradation types, yielding over 10,000 visual question answering (VQA) instances. It was created by author xlzhou126 and last updated on May 24, 2026.

Use Cases

Benchmarking MLLM spatial reasoning capabilities based on the 11 reasoning categories
Testing model robustness to visual artifacts based on the 9 degradation types like motion blur and low light
Evaluating visual question answering performance under adverse conditions based on the over 10,000 VQA instances

Strengths

Contains 1,102 questions, providing a substantial test set
Covers 9 distinct visual degradation types, including motion blur and adverse weather
Yields over 10,000 VQA instances from the question and degradation combinations
Is described as human-verified, suggesting a level of quality control

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: xlzhou126 on Hugging Face, part of the SpaceDG project
Collection Method: Likely constructed for benchmark evaluation; described as human-verified
Freshness: Last updated 2026-05-24 16:01:15; freshness should be verified

License is unknown; terms of use must be verified before application.

Multimodal Spatial Intelligence Visual Degradation Multimodal Llm Benchmark Vqa Benchmark

SpaceDG-Bench: 1,102 Questions for MLLM Spatial Intelligence Under Visual Degradation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info