SimulaMet-HOST created the Kvasir-VQA dataset by augmenting the HyperKvasir and Kvasir-Instrument datasets with question-and-answer annotations. This multimodal dataset is designed for advanced machine learning tasks in gastrointestinal diagnostics, including image captioning and Visual Question Answering. The dataset was last updated on the Hugging Face platform in August 2025.
Use Cases
- Train Visual Question Answering models based on annotated medical images and questions.
- Develop image captioning systems for gastrointestinal endoscopy images.
- Generate synthetic medical images using text prompts based on the dataset's annotations.
- Fine-tune multimodal LLMs for specialized medical diagnostic assistance.
Strengths
- Derived from established medical imaging datasets (HyperKvasir and Kvasir-Instrument).
- Specifically annotated for advanced multimodal AI tasks like VQA and image captioning.
- Last updated on 2025-08-08, indicating recent maintenance.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats, and license information are unknown, which may limit suitability assessment.
Provenance
- Source
- SimulaMet-HOST
- Collection Method
- Extended from the HyperKvasir and Kvasir-Instrument datasets with added question-and-answer annotations.
- Time Range
- null
- Freshness
- Last updated 2025-08-08 21:23:26; freshness should be verified.
- Geography
- null