Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
OLMoASR-Pool contains approximately 3.4 million hours of audio and 18.8 million unique transcripts collected from the public internet. It was created by AllenAI to train English speech recognition models and includes a variety of speaking styles, accents, and audio setups.
The full dataset description is hosted externally; users should review the page at https://huggingface.co/datasets/allenai/OLMoASR-Pool for complete details.