Tvcaption: Multimodal Video Captioning Dataset

Name: Tvcaption: Multimodal Video Captioning Dataset
Creator: jayleicn
Published: 2020-01-27T01:58:09
License: MIT
Keywords: Pytorch, Video Captioning

by jayleicnUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

262,110 natural language captions describing 108,965 video segments from 6 popular TV shows. The dataset facilitates multimodal video captioning by providing visual frames alongside time-aligned subtitle dialogue.

Use Cases

Train multimodal transformer models using the 'video_id' and 'subtitle' features
Evaluate video-to-text generation performance using the 'caption' ground truth
Benchmark temporal video grounding using the 'ts' start and end timestamps

Strengths

262,110 captions paired with 108,965 video clips
Covers 6 TV shows including 'Friends' and 'The Big Bang Theory'
Includes JSON metadata with 'caption', 'video_id', and 'ts' fields

Pytorch Video Captioning

Related Datasets

Quality Score

D22

Description

16

Source

19

Reputation

21

Access

52

Community

91 likes

0 views

Dataset Info

License: MIT
Author: jayleicn
Created: Jan 27, 2020
Updated: Sep 6, 2023
Language: Python
Last synced: May 19, 2026

Access

52

Community

91 likes

0 views

Dataset Info

License: MIT
Author: jayleicn
Created: Jan 27, 2020
Updated: Sep 6, 2023
Language: Python
Last synced: May 19, 2026

Tvcaption: Multimodal Video Captioning Dataset

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info