Name: Dolly-Audio: 1,000 Hours of Multi-Speaker Vietnamese Speech
Creator: dolly-vn
Published: 2025-11-24T06:16:49
Keywords: Text To Speech, Librarypolars, Librarydask, OPTIMIZED-PARQUET, Speech Synthesis, Modalitytext, Size Categories100 Kn1 M, Multi Speaker, Librarymlcroissant, Vietnamese, Librarydatasets, Text, Parquet, Audio, Regionus, Large Scale, Natural Language Processing, Voice Modeling, Speech Recognition, Languagevi, Synthetic

Description

Nearly 1,000 hours of professionally cleaned Vietnamese audio form this large-scale corpus created by the Dolly AI Team. The dataset features 152 speakers from different regions of Vietnam, aiming to advance research in speech synthesis and recognition. It was last updated on Hugging Face in November 2025.

Use Cases

Train text-to-speech models based on the high-quality, multi-speaker audio.
Develop automatic speech recognition systems for Vietnamese based on the large-scale speech corpus.
Conduct voice modeling and cloning research based on the professionally cleaned recordings.
Study regional accents and speaker diversity in Vietnamese speech based on the 152-speaker collection.

Strengths

Nearly 1,000 hours of audio provides a substantial resource for model training.
152 speakers offer diversity in voices and potentially regional accents.
Audio is described as professionally cleaned, suggesting high signal quality.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and exact file formats are unknown, which may limit suitability assessment.
Data may reflect geographic or demographic bias inherent to the speaker collection method.

Provenance

Source: Dolly AI Team (dolly-vn) via Hugging Face.
Collection Method: Professionally cleaned audio recordings from 152 speakers.
Time Range: null
Freshness: Last updated 2025-11-24 13:06:36; freshness should be verified.
Geography: Different regions of Vietnam.

null

Dolly-Audio: 1,000 Hours of Multi-Speaker Vietnamese Speech

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info