An academic dataset from KAIST, William and Mary, University of Alberta, and Auburn University, released in December 2025. It demonstrates a performance gap in state-of-the-art Vision Language Models (VLMs), which perform perfectly on counting tasks with original images but fail catastrophically on modified versions. The dataset is hosted on Hugging Face by author anvo25.
Use Cases
- Benchmarking VLM robustness based on performance degradation on modified images.
- Studying bias in multimodal AI systems based on the described catastrophic failure scenarios.
- Developing debiasing techniques for VLMs based on the documented counting task failures.
Strengths
- Created by a multi-institution academic team from KAIST, William and Mary, University of Alberta, and Auburn University.
- Demonstrates a specific, documented failure case where model performance drops from 100% to 17.05%.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Hugging Face (author: anvo25).
- Collection Method
- Likely contains experimental results and images used to test Vision Language Models.
- Freshness
- Last updated 2025-12-12 06:00:11; freshness should be verified.