Korean TTS Training Dataset with 120 Sentences Across Pronunciation and Prosody Categories

Name: Korean TTS Training Dataset with 120 Sentences Across Pronunciation and Prosody Categories
Creator: daje
Published: 2026-04-08T10:15:34
Keywords: Speech Synthesis, Korean Language, Audio, Tts Training, Audio Generation

by dajeUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

120 Korean speech sentences were generated using the Google Gemini gemini-2.5-pro-preview-tts model with the Zephyr voice. The dataset includes categories for pronunciation, prosody, emotion, and intonation. Audio files are in 24kHz, 16-bit, mono WAV format.

Use Cases

Fine-tuning TTS models based on the described pronunciation and prosody categories.
Benchmarking speech synthesis quality on Korean-specific phonetic phenomena like double consonants and vowel assimilation.
Training emotion or intonation control models based on the described emotional and intonational categories.

Strengths

Audio is generated with a specific, named model (Google Gemini gemini-2.5-pro-preview-tts, Voice: Zephyr).
Includes 70 sentences focused on pronunciation and prosody, broken into specific sub-categories like numbers and consonant assimilation.
Audio format specifications are explicitly provided: WAV (PCM), 24,000 Hz sample rate, 16-bit, mono.

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
The dataset's small scale of 120 sentences may limit training for complex models.

Provenance

Source: huggingface user daje
Collection Method: Generated using the Google Gemini gemini-2.5-pro-preview-tts model.
Time Range: null
Freshness: Last updated 2026-04-08 11:04:59; freshness should be verified.
Geography: null

null

Audio Speech Synthesis Korean Language Tts Training Audio Generation

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

44

Access

26

Community

458 downloads

1 likes

0 views

Dataset Info

Author: daje
Created: Apr 8, 2026
Updated: Apr 8, 2026
Last synced: Apr 24, 2026

Access

26

Community

458 downloads

1 likes

0 views

Dataset Info

Author: daje
Created: Apr 8, 2026
Updated: Apr 8, 2026
Last synced: Apr 24, 2026

Korean TTS Training Dataset with 120 Sentences Across Pronunciation and Prosody Categories

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info