WorldBench is a multimodal reasoning benchmark organized around a visual taxonomy spanning seven domains: Living Things, Objects, Scenes, Digital World, Academics, Documents/Charts/Tables, and Agents. It is designed by zlab-princeton to evaluate Multimodal Large Language Models. The dataset was last updated on 2026-06-08.
Use Cases
- Benchmarking model performance on visual reasoning across seven diverse domains mentioned in the description
- Identifying model weaknesses in specific visual categories like 'Documents/Charts/Tables' or 'Agents'
- Training or fine-tuning MLLMs on a structured visual taxonomy
- Conducting research on the breadth of multimodal understanding
Strengths
- The benchmark is structured around a broad visual taxonomy covering seven distinct domains.
- It is designed specifically for evaluating modern Multimodal Large Language Models.
- The dataset was last updated on 2026-06-08.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and file formats are unknown, which may limit suitability assessment.
Provenance
- Source
- zlab-princeton
- Freshness
- Last updated 2026-06-08 01:32:32; freshness should be verified.