218.2 hours of transcribed Turkish speech across 186,171 utterances. The collection supports research in multilingual speech recognition for Turkic languages and is hosted via the IS2AI GitHub repository.
Use Cases
- Train automatic speech recognition (ASR) models using the 186,171 transcribed utterances
- Develop multilingual speech recognition systems for Turkic languages by combining this Turkish speech corpus with other regional datasets
- Perform linguistic analysis on the 218.2 hours of transcribed speech to study Turkic language patterns
Strengths
- 218.2 hours of transcribed speech data
- 186,171 individual speech utterances
- Open-source availability via the IS2AI/TurkicASR GitHub repository