LibriSpeech Train Clean 100: English Speech Audio for ASR
Available on 1 platform
Sign in to view source links and access this dataset
Description
A Kaggle-hosted dataset titled 'Librispeech_train-clean-100', likely containing audio files for automatic speech recognition (ASR) model training. The title suggests it is a subset of the LibriSpeech corpus, comprising 100 hours of 'clean' speech. Specific details on size, format, and provenance require verification after download.
Use Cases
Train a speech recognition model on clean, read English speech (inferred from domain, verify after download)
Benchmark ASR system performance on a standard corpus subset (inferred from domain, verify after download)
Fine-tune pre-trained models for specific acoustic conditions (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science resources.
Title references the well-known LibriSpeech corpus, suggesting a standard benchmark origin.
Limitations
Metadata is minimal; actual content requires verification after download.
Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect bias inherent to its source corpus (e.g., speaker demographics, recording conditions).
Provenance
Source
Likely derived from the LibriSpeech corpus.
Collection Method
Method of gathering is unknown.
Time Range
Temporal coverage is unknown.
Freshness
Last update date is unknown; freshness unverified.