Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
This dataset documents 10 specific failure cases where the Qwen3.5-Base-0.8B vision-language model produced incorrect answers on visual question answering tasks. The examples were sampled from the SimpleVQA benchmark and include the original image, question, expected answer, and the model's actual output.
The full dataset description is hosted externally; users must visit the linked Hugging Face page for complete details.