Sign in to view source links and access this dataset
Description
604,779 training examples for multi-view visual reasoning released by the VIEW2SPACE project. The dataset includes 22,205 public images and supports three question families: count, detect, and multiple-choice questions. Authored by Pokerme, this training release was last updated on June 22, 2026.
Use Cases
Training models for object counting based on multi-view image observations
Developing visual question answering systems for object detection tasks across multiple views
Benchmarking spatial reasoning capabilities of multimodal AI models using multiple-choice questions
Studying chain-of-thought reasoning in vision-language models as suggested by the description
Strengths
Contains 604,779 training examples, providing a substantial scale for model training
Includes 22,205 public images for multi-view analysis
Covers three distinct reasoning tasks: count, detect, and multiple-choice questions
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download
Provenance
Source
Pokerme on Hugging Face
Collection Method
Released as part of the VIEW2SPACE research project, associated with an ECCV 2026 paper.
Freshness
Last updated 2026-06-22 10:19:06; freshness should be verified
License is unknown; users should verify terms of use before downloading.