Offering a filtered collection of Uzbek speech recordings processed through voice activity detection, noise removal, and reading speed analysis. It excludes original Mozilla Common Voice files in favor of a refined subset validated via automatic speech-to-text (STT) models to ensure high-quality audio-text alignment.
Use Cases
- Train Uzbek speech-to-text models using the audio files and their corresponding validated text transcriptions.
- Analyze natural speech prosody using the subset of recordings filtered for standard reading speeds.
- Evaluate the performance of voice activity detection algorithms against a dataset pre-processed for noise and silence.
Strengths
- Filters out audio files lacking voice activity or containing only noise after denoising.
- Removes 5-10% of recordings identified as reading speed outliers to maintain natural speech patterns.
- Validated using an automatic speech-to-text (STT) model trained on a high-confidence subset.
- Derived from the Uzbek language portion of the Mozilla Common Voice project.