Nearly 1,000 hours of professionally cleaned Vietnamese audio form this large-scale corpus created by the Dolly AI Team. The dataset features 152 speakers from different regions of Vietnam, aiming to advance research in speech synthesis and recognition. It was last updated on Hugging Face in November 2025.
Use Cases
- Train text-to-speech models based on the high-quality, multi-speaker audio.
- Develop automatic speech recognition systems for Vietnamese based on the large-scale speech corpus.
- Conduct voice modeling and cloning research based on the professionally cleaned recordings.
- Study regional accents and speaker diversity in Vietnamese speech based on the 152-speaker collection.
Strengths
- Nearly 1,000 hours of audio provides a substantial resource for model training.
- 152 speakers offer diversity in voices and potentially regional accents.
- Audio is described as professionally cleaned, suggesting high signal quality.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and exact file formats are unknown, which may limit suitability assessment.
- Data may reflect geographic or demographic bias inherent to the speaker collection method.
Provenance
- Source
- Dolly AI Team (dolly-vn) via Hugging Face.
- Collection Method
- Professionally cleaned audio recordings from 152 speakers.
- Time Range
- null
- Freshness
- Last updated 2025-11-24 13:06:36; freshness should be verified.
- Geography
- Different regions of Vietnam.