Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A benchmark suite introduced in the paper 'Same or Not? Enhancing Visual Perception in Vision-Language Models'. It contains 12,000 challenging (image, question, answer) tuples emphasizing fine-grained image understanding. The dataset is composed of six sub-benchmarks and is hosted by glab-caltech.
For evaluating on the dataset with LMMS-eval, users are referred to an external repository.