Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MLCommons provides over one million hours of English audio extracted from Archive.org for unsupervised speech research. The collection features a diverse set of speakers and is available under CC-BY and CC-BY-SA licenses for academic and commercial applications. It was last updated in February 2025 to support large-scale speech model development.
Processing one million hours of audio requires significant storage and high-performance compute resources; users should verify specific Archive.org item licenses if redistributing individual files.