PhoAudiobook is a high-quality and large-scale Vietnamese speech dataset curated for zero-shot text-to-speech. The dataset construction and experimental results are detailed in the ACL 2025 paper 'Zero-Shot Text-to-Speech for Vietnamese' by Thi Vu, Linh The Nguyen, and Dat Quoc Nguyen. The dataset page was last updated on Hugging Face in January 2026.
Use Cases
- Training zero-shot text-to-speech models based on the high-quality Vietnamese speech data.
- Benchmarking TTS model performance for Vietnamese based on the described dataset scale and quality.
- Researching cross-speaker voice synthesis based on the zero-shot learning focus mentioned in the description.
Strengths
- Described as 'high-quality' in the dataset description.
- Described as 'large-scale' in the dataset description.
- Associated with a peer-reviewed ACL 2025 publication.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats, and license are unknown, which may limit suitability assessment.
Provenance
- Source
- Hugging Face dataset by author 'thivux'.
- Collection Method
- Curated for zero-shot text-to-speech; details in the associated ACL 2025 paper.
- Time Range
- null
- Freshness
- Last updated 2026-01-01 03:52:48; freshness should be verified.
- Geography
- null