Name: Multi Domain VQA 20K: A Visual Question Answering Benchmark for VLMs
Creator: dutta18
Published: 2025-07-19T04:48:17
Keywords: Size Categories10 Kn100 K, Librarypolars, Knowledge Testing, Librarydask, Languageen, Task Categoriesvisual Question Answering, Modalitytext, Librarymlcroissant, Modalityimage, Educational Research, Librarydatasets, Parquet, Regionus, Licenseapache 20, Multimodal Benchmark, Visual Question Answering, Medical, Multimodal

Description

20,000 samples combine questions and images from three established VQA datasets: AOKVQA, Path-VQA, and TDIUC. This medium-sized benchmark is designed to test the multi-domain knowledge of vision-language models. It was created by dutta18 for educational and research purposes, with copyright retained by the original dataset owners.

Use Cases

Benchmarking model performance on multi-domain visual question answering based on the combined AOKVQA, Path-VQA, and TDIUC splits.
Testing model knowledge on quantitative and physical reasoning questions based on the TDIUC subset.
Rapid prototyping of VLM inference pipelines using a pre-consolidated dataset.
Educational demonstrations of VLM capabilities across different question and image domains.

Strengths

Consolidates 20,000 samples from three established VQA benchmarks for multi-domain testing.
Specifically includes quantitative and physical reasoning questions from the TDIUC validation split.
Last updated on 2025-07-19, indicating recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect the geographic, temporal, or source biases inherent to the original constituent datasets.

Provenance

Source: Combines samples from AOKVQA, Path-VQA, and TDIUC datasets.
Collection Method: Created by dutta18 by merging train and validation splits from the source datasets.
Freshness: Last updated 2025-07-19 12:13:51.

Intended for educational and research purposes only; all copyright belongs to the original dataset owners.

Multimodal Parquet Size Categories10 Kn100 K Librarypolars Knowledge Testing Librarydask Languageen Task Categoriesvisual Question Answering Modalitytext Librarymlcroissant Modalityimage Educational Research Librarydatasets Regionus Licenseapache 20 Multimodal Benchmark Visual Question Answering Medical

Multi Domain VQA 20K: A Visual Question Answering Benchmark for VLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info