27,519 images and corresponding question-answer pairs translated from the GQA train_balanced and testdev_balanced splits into Russian. The data underwent gpt-4-turbo translation followed by manual validation to correct errors and remove safety-filtered content. It is structured for use within the lmms-eval pipeline to support multimodal model benchmarking.
Use Cases
- Evaluate visual reasoning of Russian-language multimodal models using the translated question and answer fields.
- Fine-tune vision-language models on the 27,519 images to improve Russian-specific visual question answering.
- Conduct cross-lingual performance analysis by comparing model accuracy on this Russian split versus the original English GQA.
Strengths
- Contains 27,519 images in the train split translated from GQA train_balanced.
- Translated via gpt-4-turbo with manual validation to filter model protection triggers and common errors.
- Formatted for direct integration with the lmms-eval pipeline for multimodal evaluation.