LibriSpeech ASR Adversarial Audio Corpus

Name: LibriSpeech ASR Adversarial Audio Corpus
Creator: RaphaelOlivier
Published: 2022-07-18T19:08:15
Keywords: Licensecc By 40, Regionus

by RaphaelOlivierUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Packed with approximately 1000 hours of read English speech audio, prepared by Vassil Panayotov with assistance from Daniel Povey. It is derived from LibriVox audiobooks, segmented and aligned, with a 16 kHz sampling rate.

Use Cases

Train automatic speech recognition models on 1000 hours of read English speech audio.
Benchmark adversarial robustness of ASR systems using carefully segmented and aligned audio.
Analyze speech patterns and alignment from LibriVox audiobook recordings.

Strengths

Approximately 1000 hours of audio data provides substantial material for model training.
Audio is carefully segmented and aligned, ensuring structured data for ASR tasks.
Data is derived from the LibriVox project, a known source of public domain audiobooks.

Limitations

Audio is stored in FLAC format, requiring conversion to float32 arrays for typical ML use, adding a preprocessing step.
The dataset consists solely of read speech from audiobooks, which may not represent spontaneous conversational speech patterns.
Specific details on adversarial perturbations or modifications are not described in the provided input.

Provenance

Source: LibriVox project audiobooks.
Collection Method: Derived from read audiobooks, carefully segmented and aligned.
Freshness: The dataset was last updated on 2025-04-03.
Geography: Region tag suggests US focus, but specific coverage is unknown.

Audio files are in FLAC format; conversion to float32 arrays is required for typical ML pipelines, as demonstrated in the provided Python code snippet.

Licensecc By 40 Regionus

Related Datasets

Quality Score

D35

Description

43

Source

36

Reputation

26

Access

22

Community

14 downloads

0 views

Dataset Info

Author: RaphaelOlivier
Created: Jul 18, 2022
Updated: Apr 3, 2025
Last synced: Apr 30, 2026

Access

22

Community

14 downloads

0 views

Dataset Info

Author: RaphaelOlivier
Created: Jul 18, 2022
Updated: Apr 3, 2025
Last synced: Apr 30, 2026

LibriSpeech ASR Adversarial Audio Corpus

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info