VisWorld-Eval is a task suite for assessing multimodal reasoning with visual world modeling. It comprises seven tasks spanning synthetic and real-world domains, each designed to isolate specific atomic world-model capabilities. The dataset was authored by 'thuml' and last updated on Hugging Face on March 9, 2026.
Use Cases
- Benchmarking model performance on paper folding simulation tasks based on the 'SpatialViz' source
- Evaluating multi-hop manipulation reasoning capabilities as described in the task list
- Assessing visual world modeling across both synthetic and real-world domains as outlined in the description
Strengths
- Comprises seven distinct tasks designed to isolate specific atomic world-model capabilities
- Spans both synthetic and real-world domains for varied evaluation
- Includes a task with 480 test samples sourced from 'SpatialViz'
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- Hugging Face, authored by 'thuml'
- Collection Method
- Likely curated for research and benchmarking purposes
- Time Range
- null
- Freshness
- Last updated 2026-03-09 10:29:01; freshness should be verified
- Geography
- null