443,641 Vietnamese audio samples and corresponding phonemized transcripts totaling 1,000 hours of speech data. The collection is structured for training and fine-tuning high-quality Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models.
Use Cases
- Train a Vietnamese acoustic model from scratch using the 1,000 hours of audio samples and phonemized transcripts
- Fine-tune a neural TTS system to improve pronunciation accuracy using the phonemized transcript labels
- Develop an ASR system by mapping the audio samples to the provided text transcripts for speech-to-text conversion
Strengths
- 443,641 individual audio samples paired with transcripts
- 1,000 total hours of Vietnamese speech data
- Includes phonemized transcripts for every audio sample to support neural TTS training
- Supports both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) model development