TVQA: Localized, Compositional Video Question Answering

Name: TVQA: Localized, Compositional Video Question Answering
Creator: jayleicn
Published: 2018-08-22T03:46:17
License: MIT
Keywords: Pytorch, Videoqa, Tvqa

by jayleicnUpdated 3y ago

Description

152,545 multiple-choice questions based on 21,793 video clips from 6 popular TV shows including The Big Bang Theory and Grey's Anatomy. The dataset provides paired subtitles and localized temporal annotations for every question to support multimodal reasoning.

Use Cases

Train a multimodal transformer to select the correct answer from 5 candidates using the q, a0-a4, and answer_idx fields
Develop temporal localization models to predict the relevant video window using the provided start and end timestamps
Benchmark cross-modal reasoning by integrating visual features with the provided subtitle text strings

Strengths

152,545 QA pairs with 5-way multiple-choice options and a correct answer index
Temporal grounding labels providing start and end timestamps for the relevant video segment
Multimodal inputs including video frames and character-level subtitles for 6 distinct TV series
Compositional questions categorized by reasoning types such as 'what', 'who', 'where', 'why', and 'how'

Pytorch Videoqa Tvqa

Related Datasets

Quality Score

D21

Description

16

Source

19

Reputation

16

Access

52

Community

182 likes

0 views

Dataset Info

License: MIT
Author: jayleicn
Created: Aug 22, 2018
Updated: Oct 25, 2022
Language: Python
Last synced: Jun 2, 2026

Access

52

Community

182 likes

0 views

Dataset Info

License: MIT
Author: jayleicn
Created: Aug 22, 2018
Updated: Oct 25, 2022
Language: Python
Last synced: Jun 2, 2026

TVQA: Localized, Compositional Video Question Answering

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info