Name: COCO-ARVQA: Arabic Visual Question Answering Dataset Based on COCO 2017 Images
Creator: MouaffakAyoub
Published: 2026-04-27T09:44:18
Keywords: Multimodal Ai, Computer Vision, Arabic Nlp, Visual Question Answering, Multimodal

Description

COCO-ARVQA is an Arabic Visual Question Answering dataset built over images from the MS COCO 2017 train2017 archive. It provides Arabic questions, answers, answer lists, and identifiers linking to COCO images, created by author MouaffakAyoub and last updated on 2026-04-27. The dataset does not redistribute the COCO images themselves, requiring users to obtain the official image archive separately.

Use Cases

Train Arabic visual question answering models based on the provided question-answer pairs.
Benchmark the performance of multimodal language models on Arabic-language tasks.
Conduct research on cross-lingual transfer learning for VQA using the Arabic annotations.
Fine-tune vision-language models for specific applications targeting Arabic-speaking users.

Strengths

Built on the established MS COCO 2017 image dataset, providing a foundation of diverse visual content.
Provides multiple linked data components: Arabic questions, answers, answer lists, and image identifiers.
Includes both training and validation splits, supporting standard machine learning workflows.

Limitations

Row count is unknown, which may limit suitability assessment for large-scale training.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Derived annotations built over the MS COCO 2017 image dataset.
Time Range: Based on COCO 2017, suggesting a 2017 or earlier image collection timeframe.
Freshness: Last updated 2026-04-27 10:03:12; freshness should be verified.
Geography: null

This repository does not redistribute the COCO images; users must obtain the official COCO 2017 train2017.zip archive separately to use the dataset fully.

Multimodal Multimodal Ai Computer Vision Arabic Nlp Visual Question Answering

COCO-ARVQA: Arabic Visual Question Answering Dataset Based on COCO 2017 Images

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info