Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VieNeu-TTS-140h contains 74,858 Vietnamese audio samples and phonemized transcripts totaling 140 hours of speech data. Developed by pnnbao-ump and updated in late 2024, the collection was sourced from YouTube and refined through a pipeline involving Whisper-large-v3 transcription and human-in-the-loop correction.
The dataset is provided in Arrow format and licensed under Apache 2.0. Users should be aware that the audio was cleaned of background noise programmatically.