10 hours of Turkish media speech audio clips designed for evaluating Automated Speech Recognition (ASR) systems. This dataset is part of the MediaSpeech collection which also covers French, Arabic, and Spanish languages.
Use Cases
- Benchmark the word error rate (WER) of Turkish ASR models using the provided media speech audio
- Fine-tune speech-to-text systems on Turkish broadcast media characteristics
- Conduct cross-linguistic ASR performance comparisons by combining this data with the French, Arabic, and Spanish subsets of SLR108
Strengths
- 10 hours of audio recordings specifically in the Turkish language
- Part of the SLR108 MediaSpeech collection covering four major languages
- Distributed under the Creative Commons Attribution 4.0 International License
- Comprised of short speech segments extracted from media sources