Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A 506-sample multimodal reasoning benchmark created by EthanSun and last updated on 2026-06-08. It evaluates vision-language models on their ability to remain faithful to task-relevant visual evidence when visually salient but answer-irrelevant distractions are added. Each sample includes original and distracted images, a question, answer choices, the correct answer, and the distraction specification.
License restrictions are unknown and should be verified before use.