Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1 million hours of English audio-text data was collected from the public internet by AllenAI. The dataset includes a variety of speaking styles, accents, and audio setups, supporting the training of the OLMoASR speech recognition models.
The full description is hosted externally; review the dataset page at https://huggingface.co/datasets/allenai/OLMoASR-Mix for complete details.