LibriSpeech: A Large-Scale Corpus of Read English Speech

Available on 1 platform

Sign in to view source links and access this dataset

Description

LibriSpeech is a widely used public domain corpus derived from audiobooks. The dataset is published on Kaggle, making it accessible for download and experimentation. Its specific size, version, and update details are not provided in the available metadata.

Use Cases

Training an acoustic model for English speech recognition (inferred from domain, verify after download)
Benchmarking speech-to-text system performance (inferred from domain, verify after download)
Developing models for speaker identification or diarization (inferred from domain, verify after download)

Strengths

Published on Kaggle, a major platform for data science and machine learning.

Limitations

Metadata is minimal; actual content requires verification after download.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: null
Collection Method: null
Time Range: null
Freshness: Last update date is unknown; freshness unverified.
Geography: null

null

Audio Machine Learning Audio Data Speech Recognition

Related Datasets

Quality Score

D15

Description

5

Source

17

Reputation

18

Access

31

Community

0 views

Dataset Info

Last synced: Apr 9, 2026

Access

31

Community

0 views

Dataset Info

Last synced: Apr 9, 2026

LibriSpeech: A Large-Scale Corpus of Read English Speech

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info