Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ZeroBench is a visual reasoning benchmark containing fewer than 1,000 image-text pairs designed to challenge contemporary Large Multimodal Models (LMMs). Created by Jonathan Roberts and associated with Arxiv paper 2502.09696, the dataset was updated in December 2025 to include refined hierarchical question structures. It focuses on tasks that were considered nearly unsolvable for multimodal models at the time of its release.
Users should refer to the v3 changelog (23/12/2025) for the most recent question structures; models must strictly follow formatting instructions in question_text to be evaluated correctly.