A Vietnamese text-to-speech dataset containing 1,805 paired audio recordings and text transcriptions for fine-tuning VieNeu-TTS models. The dataset was created by author 'quocs' and last updated on February 10, 2026. Audio files are in WAV format at 24kHz, mono, with 16-bit PCM encoding.
Use Cases
- Fine-tuning Vietnamese text-to-speech models based on paired audio and text data.
- Training automatic speech recognition systems for Vietnamese based on audio transcriptions.
- Developing or benchmarking neural codec models for Vietnamese speech synthesis.
- Creating synthetic Vietnamese speech for applications based on the described audio-text pairs.
Strengths
- Contains 1,805 paired audio and text samples for training.
- Audio recordings are in a consistent format: WAV, 24kHz, mono, 16-bit PCM.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Collection Method
- Likely contains paired audio recordings and Vietnamese text transcriptions.
- Time Range
- null
- Freshness
- Last updated 2026-02-10 05:00:45; freshness should be verified.
- Geography
- null