Sign in to view source links and access this dataset
Description
Librispeech Long is a speech audio dataset derived from the LibriSpeech corpus, likely containing longer-form English audio segments. The dataset was created by distil-whisper and was last updated on Hugging Face in November 2023. Its specific size, format, and license details are not provided in the available metadata.
Use Cases
Fine-tuning speech recognition models on longer audio segments.
Benchmarking ASR system performance on extended speech.
Training or evaluating models for audiobook or podcast transcription.
Developing speaker diarization or segmentation algorithms on continuous speech.
Strengths
Based on the established LibriSpeech corpus, a widely-used benchmark in speech recognition.
Created by distil-whisper, suggesting a focus on efficient, distilled model applications.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
distil-whisper
Freshness
Last updated 2023-11-02 14:22:54; freshness should be verified.
License is unknown; users must verify permissions before use.