1,000 hours of Arabic speech audio sampled at 16 kHz, sourced from over 700 YouTube channels. The collection spans multiple regions, genres, and dialects to support the development of speech recognition technologies.
Use Cases
- Train Automatic Speech Recognition (ASR) models using the 1,000 hours of multi-dialectal audio
- Develop dialect identification systems by leveraging the multi-regional and multi-dialect nature of the speech samples
- Fine-tune acoustic models for 16 kHz audio processing in diverse acoustic environments
Strengths
- 1,000 hours of speech audio sampled at 16 kHz
- Data sourced from over 700 distinct YouTube channels
- Covers multi-regional, multi-genre, and multi-dialect Arabic speech