Encyclopedic-VQA is a visual question answering dataset converted to a unified Parquet schema. The dataset, originally from Google and presented at ICCV 2023 by Mensink et al., contains questions about detailed properties of fine-grained categories. The data is hosted on Hugging Face by the author reonokiy and was last updated on April 1, 2026.
Use Cases
- Training multimodal VQA models based on the question and answer fields.
- Benchmarking model performance on fine-grained visual recognition tasks based on the described category properties.
- Researching knowledge grounding from encyclopedic sources based on the wikipedia_title and evidence fields.
- Developing models for question type classification based on the question_type field.
Strengths
- Based on a peer-reviewed ICCV 2023 publication from Google.
- Converted to a unified Parquet schema for easier processing.
- Includes fields for original questions, answers, evidence, and Wikipedia references.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats, and license information are unknown.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Google, ICCV 2023 (Mensink et al.)
- Collection Method
- Converted to unified Parquet schema from the original Encyclopedic-VQA dataset.
- Time Range
- null
- Freshness
- Last updated 2026-04-01 07:09:42; freshness should be verified.
- Geography
- null