Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
217 examples across 7 top-level categories and 23 subcategories comprise this benchmark for evaluating multimodal models. Created by zai-org, the dataset requires models to identify entities and perform multi-step reasoning with search-augmented information to answer complex questions. It was last updated on 2026-05-16.
License is unknown; terms of use must be verified.