Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
GigaSpeech is a multi-domain English speech recognition corpus containing 10,000 hours of high-quality labeled audio released by SpeechColab in 2021. The data is aggregated from audiobooks, podcasts, and YouTube, capturing a mix of read and spontaneous speaking styles across topics like arts, science, and sports.
The dataset is released under the Apache 2.0 license. Users should be prepared for significant storage and bandwidth requirements given the 10,000-hour audio scale.