Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
369,510 hours of speech audio and text captions sourced from YouTube, released by the espnet team in 2024. The dataset pairs audio utterances with either user-uploaded (manual) or system-generated (automatic) captions.
A newer version, YODAS2, is available which provides unsegmented audio and a higher sampling rate of 24k. Users should be aware that 'manual' captions only indicate user-upload status, not necessarily human transcription.