Sign in to view source links and access this dataset
Description
LibriSpeech-subset is a dataset of audio speech recordings, likely derived from the LibriSpeech corpus. The dataset is hosted on Kaggle, but its specific size, content details, and creation date are not provided in the metadata. The original LibriSpeech corpus is a widely used benchmark for automatic speech recognition research.
Use Cases
Train a speech-to-text model on clean, read speech (inferred from domain, verify after download)
Benchmark ASR model performance against a standard subset (inferred from domain, verify after download)
Fine-tune a pre-trained model for specific acoustic conditions (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science resources.
Likely derived from the LibriSpeech corpus, a well-known benchmark in speech recognition.
Limitations
Metadata is minimal; actual content requires verification after download.
Row count, file formats, and column-level documentation are unknown.
Data may reflect the biases inherent to the source corpus and its collection methods.
Provenance
Source
Likely derived from the LibriSpeech corpus.
Collection Method
Method of subset creation is unknown.
Time Range
Temporal coverage of the source corpus is unknown for this subset.
Freshness
Last updated date is unknown.
Geography
Geographic coverage of the source corpus is unknown for this subset.
License information is unknown; verify terms before use.