Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MLCommons provides the People's Speech dataset, a collection of over 30,000 hours of transcribed English speech. This corpus is designed for training large-scale speech-to-text systems and is released under permissive licenses for both academic and commercial applications.
The dataset is distributed in Parquet format and may require the Dask library for efficient handling of the large volume of data; users should verify specific sub-licenses (CC-BY vs CC-BY-SA) for their specific use case.