105 hours of manually checked Uzbek speech recordings featuring 958 unique speakers. The dataset includes transcribed audio files designed for speech recognition tasks in the Uzbek language.
Use Cases
- Train automatic speech recognition (ASR) models using the transcribed audio recordings
- Perform speaker identification tasks using the 958 unique speakers
- Conduct linguistic analysis of the Uzbek language using the manually verified transcriptions
Strengths
- 105 hours of transcribed audio recordings
- 958 unique speakers represented in the corpus
- Manually checked transcriptions to ensure high data quality