DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Librispeech 100 Hour English Speech Corpus | DataSalon

Home Speech & AudioLibrispeech 100 Hour English Speech Corpus

Speech & Audio

Librispeech 100 Hour English Speech Corpus

Name: Librispeech 100 Hour English Speech Corpus
Creator: namnv1906
Published: 2022-05-19T07:46:39
Keywords: Machine Learning, Modalitytext, English Language, Audio, Regionus, Speech Recognition

by namnv1906·Updated 4y ago

Available on 1 platform

Description

Librispeech 100H is a subset of the LibriSpeech corpus containing 100 hours of English speech audio. The dataset was created by namnv1906 and uploaded to Hugging Face in May 2022. It is derived from public domain audiobooks from the LibriVox project.

Use Cases

Train acoustic models on 100 hours of English speech audio.
Benchmark ASR system accuracy using aligned audio and transcription pairs.
Develop speaker-independent models using data from multiple public domain audiobook readers.

Strengths

Contains 100 hours of speech audio.
Derived from a well-known, public domain source corpus.

Limitations

Limited to 100 hours, a smaller subset compared to the full LibriSpeech corpus.
Content is restricted to English audiobooks, lacking diversity in accents and domains.

Provenance

Source: LibriSpeech corpus (LibriVox audiobooks).
Collection Method: Derived from public domain audiobook recordings.
Freshness: Last updated on Hugging Face in May 2022.

Audio Machine Learning Modalitytext English Language Regionus Speech Recognition

Related Datasets

Quality Score

D20

Description

Source

Reputation

Quality Score

D20

Description

Source

Reputation

Access

Community

13 downloads

0 views

Dataset Info

Author: namnv1906
Created: May 19, 2022
Updated: May 19, 2022
Last synced: Apr 30, 2026

Access

Community

13 downloads

0 views

Dataset Info

Author: namnv1906
Created: May 19, 2022
Updated: May 19, 2022
Last synced: Apr 30, 2026

Librispeech 100 Hour English Speech Corpus

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info