TaiwanVQA is a visual question answering benchmark containing 2,736 original images paired with 5,472 manually designed questions. It is designed to evaluate the capability of vision-language models in recognizing and reasoning about culturally specific content related to Taiwan. The dataset was created by author hhhuang and last updated on December 4, 2025.
Use Cases
- Benchmarking model performance on culturally specific visual question answering tasks based on the 5,472 manually designed questions.
- Training vision-language models to recognize culturally relevant scenes and objects based on the 2,736 images captured in Taiwan.
- Analyzing model biases or gaps in cultural understanding related to daily life in Taiwan based on the dataset's topics.
Strengths
- Contains 2,736 original images captured specifically for this dataset.
- Includes 5,472 manually designed questions, suggesting careful annotation.
- Focuses on culturally specific content related to Taiwan, a potentially underrepresented domain.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment for large-scale training.
Provenance
- Source
- hhhuang
- Collection Method
- Images were captured by the dataset team and questions were manually designed.
- Time Range
- null
- Freshness
- Last updated 2025-12-04 08:40:14; freshness should be verified.
- Geography
- Taiwan