10 single-speaker speech datasets covering 10 languages including German, Greek, Spanish, Finnish, French, Hungarian, Japanese, Dutch, Russian, and Chinese. Each language-specific subset contains audio recordings paired with text transcriptions for speech synthesis tasks.
Use Cases
- Train neural text-to-speech (TTS) models using the audio and transcription pairs
- Evaluate multi-lingual speech synthesis architectures across the 10 provided language subsets
- Benchmark acoustic modeling performance on single-speaker datasets for diverse linguistic families
Strengths
- Covers 10 distinct languages: German, Greek, Spanish, Finnish, French, Hungarian, Japanese, Dutch, Russian, and Chinese
- Features single-speaker audio recordings for each language to ensure acoustic and prosodic consistency
- Includes text transcriptions mapped to audio files for supervised speech synthesis training