Uzbek language audio clips and text transcriptions sourced from YouTube news channels Kunuz and Qalampir across multiple regional dialects. The dataset utilizes Gemini 2.5 Pro for transcription generation to support Automatic Speech Recognition (ASR) development.
Use Cases
- Train Automatic Speech Recognition (ASR) models using the audio clips and transcription text
- Evaluate dialectal accuracy of speech-to-text systems across different Uzbek regional accents
- Fine-tune language models on news-specific vocabulary and syntax found in the transcriptions
- Analyze linguistic variations in Uzbek news reporting using the transcription text
Strengths
- Sourced from prominent Uzbek news channels Kunuz and Qalampir
- Includes transcriptions generated and refined using the Gemini 2.5 Pro model
- Features audio clips representing multiple Uzbek regional dialects
- Derived from publicly available YouTube news video content