Vaja-Thai: Combined Thai Speech Dataset for TTS Research, 554.6 Hours

Name: Vaja-Thai: Combined Thai Speech Dataset for TTS Research, 554.6 Hours
Creator: dubbing-ai
Published: 2026-03-28T08:45:01
Keywords: Text To Speech, Audio Dataset, Speech Synthesis, Audio, Thai Language

by dubbing-aiUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Vaja-Thai is a unified Thai speech dataset containing 289,916 audio samples totaling 554.6 hours for Text-to-Speech research. The dataset was created by dubbing-ai and last updated in April 2026. All audio is resampled to 24 kHz WAV format and combines multiple quality-filtered sources.

Use Cases

Training Thai speech synthesis models based on the 554.6 hours of audio.
Fine-tuning TTS systems based on the unified, quality-filtered dataset.
Benchmarking TTS model performance on a standardized Thai speech corpus.
Researching prosody and pronunciation in Thai language based on the high-volume audio samples.

Strengths

Contains 289,916 audio samples, providing substantial volume for model training.
Totals 554.6 hours of Thai speech, offering extensive coverage for TTS tasks.
Audio is standardized to 24 kHz WAV format, ensuring consistent technical specifications.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
The license for the combined dataset is unknown, which may restrict usage.
Data may reflect source bias inherent to the original contributing datasets.

Provenance

Source: dubbing-ai on Hugging Face, combining multiple sources including tsync2.
Collection Method: Unified and quality-filtered combination of multiple Thai speech sources.
Time Range: null
Freshness: Last updated 2026-04-01 19:14:14; freshness should be verified.
Geography: null

License for the combined dataset is unknown; individual source licenses like CC-BY-NC-SA-3.0 may apply.

Audio Text To Speech Audio Dataset Speech Synthesis Thai Language

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

42

Access

22

Community

111 downloads

1 likes

0 views

Dataset Info

Author: dubbing-ai
Created: Mar 28, 2026
Updated: Apr 1, 2026
Last synced: May 5, 2026

Access

22

Community

111 downloads

1 likes

0 views

Dataset Info

Author: dubbing-ai
Created: Mar 28, 2026
Updated: Apr 1, 2026
Last synced: May 5, 2026

Vaja-Thai: Combined Thai Speech Dataset for TTS Research, 554.6 Hours

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info