120 Korean speech sentences were generated using the Google Gemini gemini-2.5-pro-preview-tts model with the Zephyr voice. The dataset includes categories for pronunciation, prosody, emotion, and intonation. Audio files are in 24kHz, 16-bit, mono WAV format.
Use Cases
- Fine-tuning TTS models based on the described pronunciation and prosody categories.
- Benchmarking speech synthesis quality on Korean-specific phonetic phenomena like double consonants and vowel assimilation.
- Training emotion or intonation control models based on the described emotional and intonational categories.
Strengths
- Audio is generated with a specific, named model (Google Gemini gemini-2.5-pro-preview-tts, Voice: Zephyr).
- Includes 70 sentences focused on pronunciation and prosody, broken into specific sub-categories like numbers and consonant assimilation.
- Audio format specifications are explicitly provided: WAV (PCM), 24,000 Hz sample rate, 16-bit, mono.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- The dataset's small scale of 120 sentences may limit training for complex models.
Provenance
- Source
- huggingface user daje
- Collection Method
- Generated using the Google Gemini gemini-2.5-pro-preview-tts model.
- Time Range
- null
- Freshness
- Last updated 2026-04-08 11:04:59; freshness should be verified.
- Geography
- null