Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
LibriSpeech contains 1,000 hours of 16kHz read English speech derived from LibriVox audiobooks, prepared by Vassil Panayotov and Daniel Povey. The corpus features segmented and aligned audio paired with corresponding text transcripts for speech recognition and speaker identification tasks. The dataset is organized into subsets based on the difficulty of the speech recognition task and the quality of the recordings.
Licensed under CC BY 4.0; data is derived from the LibriVox project and is a standard benchmark in the ASR community.