LibriSpeech Subset: Audio Speech Data for ASR

Available on 1 platform

Sign in to view source links and access this dataset

Description

LibriSpeech-subset is a dataset of audio speech recordings, likely derived from the LibriSpeech corpus. The dataset is hosted on Kaggle, but its specific size, content details, and creation date are not provided in the metadata. The original LibriSpeech corpus is a widely used benchmark for automatic speech recognition research.

Use Cases

Train a speech-to-text model on clean, read speech (inferred from domain, verify after download)
Benchmark ASR model performance against a standard subset (inferred from domain, verify after download)
Fine-tune a pre-trained model for specific acoustic conditions (inferred from domain, verify after download)

Strengths

Published on Kaggle, a major platform for data science resources.
Likely derived from the LibriSpeech corpus, a well-known benchmark in speech recognition.

Limitations

Metadata is minimal; actual content requires verification after download.
Row count, file formats, and column-level documentation are unknown.
Data may reflect the biases inherent to the source corpus and its collection methods.

Provenance

Source: Likely derived from the LibriSpeech corpus.
Collection Method: Method of subset creation is unknown.
Time Range: Temporal coverage of the source corpus is unknown for this subset.
Freshness: Last updated date is unknown.
Geography: Geographic coverage of the source corpus is unknown for this subset.

License information is unknown; verify terms before use.

Audio Machine Learning Audio Data Speech Recognition

Related Datasets

Quality Score

D16

Description

8

Source

17

Reputation

18

Access

31

Community

0 views

Dataset Info

Last synced: Apr 13, 2026

Access

31

Community

0 views

Dataset Info

Last synced: Apr 13, 2026

LibriSpeech Subset: Audio Speech Data for ASR

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info