3DSRBench is a manually annotated benchmark for evaluating 3D spatial reasoning in large multimodal models. It contains 2,100 visual question-answering pairs on MS-COCO images and 672 on multi-view synthetic images rendered from HSSD. The dataset was created by author 'ccvl' and was last updated on the Hugging Face platform in February 2025.
Use Cases
- Benchmarking 3D spatial reasoning in multimodal models based on annotated VQAs.
- Training models to understand spatial relationships from 2D images.
- Evaluating model performance on synthetic multi-view imagery for spatial tasks.
Strengths
- Manually annotated 2,772 visual question-answering pairs.
- Includes 672 VQAs on multi-view synthetic images, providing a controlled test environment.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- ccvl on Hugging Face
- Collection Method
- Manually annotated visual question-answering pairs on MS-COCO and synthetic images rendered from HSSD.
- Freshness
- Last updated 2025-02-03 06:16:52; freshness should be verified.