20,000+ hours of Russian speech audio paired with text transcriptions across domains like YouTube, audiobooks, and radio. The collection includes over 2 million utterances categorized by source and acoustic conditions.
Use Cases
- Train acoustic models for Russian speech recognition using the audio files and corresponding text labels
- Develop noise-tolerant speech systems by leveraging the variety of recording conditions and source types
- Evaluate speech-to-text performance across different domains like audiobooks or radio broadcasts
Strengths
- 20,000+ hours of audio data provided in WAV or MP3 formats
- Includes metadata mapping audio segments to text transcriptions and source categories
- Covers diverse acoustic environments including studio recordings, phone calls, and noisy public spaces