90 animal species are categorized within this Vietnamese-language Visual Question Answering (VQA) dataset. The collection pairs images of animals with corresponding Vietnamese text questions and answers to facilitate multimodal learning.
Use Cases
- Develop multimodal models that process Vietnamese text and animal images to generate accurate answers.
- Fine-tune vision-language models on the 90 animal species to improve domain-specific recognition.
- Benchmark the accuracy of Vietnamese NLP models when grounded in visual animal data.
Strengths
- Includes 90 distinct animal species for classification and reasoning.
- Features Vietnamese language text for all question-answer pairs.
- Designed specifically for Image-Text QA (Visual Question Answering) tasks.