VQA LXMERT: Visual Question Answering for LXMERT

Name: VQA LXMERT: Visual Question Answering for LXMERT
Creator: echarlaix
Published: 2022-03-02T23:29:22
Keywords: Regionus, Licenseapache 20

by echarlaixUpdated 4y ago

Description

Built from open-ended questions paired with images, categorized by their requirement for vision, language, and commonsense reasoning. It provides a framework for testing multimodal understanding through tasks that cannot be solved by a single modality alone.

Use Cases

Train multimodal transformers to predict answers using the question and image features
Evaluate the reasoning capabilities of vision-language models like LXMERT on open-ended tasks
Benchmark the alignment of visual and linguistic features using the provided image-question pairs

Strengths

Includes open-ended questions requiring natural language answers
Pairs visual image data with corresponding textual queries
Designed to evaluate commonsense knowledge integration in multimodal models

Regionus Licenseapache 20

Related Datasets

Quality Score

D23

Description

17

Source

36

Reputation

9

Access

22

Community

26 downloads

0 views

Dataset Info

Author: echarlaix
Created: Mar 2, 2022
Updated: Feb 9, 2022
Last synced: Apr 29, 2026

Access

22

Community

26 downloads

0 views

Dataset Info

Author: echarlaix
Created: Mar 2, 2022
Updated: Feb 9, 2022
Last synced: Apr 29, 2026

VQA LXMERT: Visual Question Answering for LXMERT

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info