AVSpeech: Separated Video and Audio Streams from YouTube Clips

Name: AVSpeech: Separated Video and Audio Streams from YouTube Clips
Creator: ProgramComputer
Published: 2026-02-17T17:45:56
Keywords: Audio Visual, Youtube Derived, Audio, Video, Speech Recognition, Multimodal

by ProgramComputerUpdated 5mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A restructured subset of the AVSpeech dataset provides separated video and audio streams. The dataset was created by ProgramComputer and was last updated on February 20, 2026. Each clip has a unique identifier derived from the original YouTube ID and timestamps.

Use Cases

Train audio-visual speech recognition models based on separated video and audio streams.
Develop lip-syncing or visual speech generation models based on the video-only stream.
Conduct research on audio-visual correspondence using the synchronized but separate media tracks.
Benchmark multimodal alignment algorithms using the derived clip identifiers and original metadata.

Strengths

Provides pre-separated video and audio streams, which likely simplifies data loading for multimodal tasks.
Includes original AVSpeech metadata fields such as YouTube ID and clip timestamps for traceability.
Media streams are described as being copied without re-encoding, which may preserve original quality.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: Derived from the original AVSpeech dataset, which sourced clips from YouTube.
Collection Method: Media streams were separated and copied without re-encoding from the original source videos.
Freshness: Last updated 2026-02-20 22:21:03; freshness should be verified.

Audio Video Multimodal Audio Visual Youtube Derived Speech Recognition

Related Datasets

Quality Score

D39

Description

42

Source

36

Reputation

49

Access

26

Community

33.3K downloads

1 likes

0 views

Dataset Info

Author: ProgramComputer
Created: Feb 17, 2026
Updated: Feb 20, 2026
Last synced: Jun 19, 2026

Access

26

Community

33.3K downloads

1 likes

0 views

Dataset Info

Author: ProgramComputer
Created: Feb 17, 2026
Updated: Feb 20, 2026
Last synced: Jun 19, 2026

AVSpeech: Separated Video and Audio Streams from YouTube Clips

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info