1,000 hours of Arabic speech audio sampled at 16 kHz, collected from over 700 YouTube channels. The data spans multiple regions, genres, and dialects to support the development of speech recognition technologies.
Use Cases
- Train automatic speech recognition (ASR) models using the 1,000 hours of multi-dialectal audio.
- Develop dialect identification systems by leveraging the multi-regional nature of the speech samples.
- Perform acoustic modeling for Arabic speech sampled at 16 kHz.
Strengths
- 1,000 hours of speech audio data
- Audio sampled at a consistent 16 kHz frequency
- Sourced from over 700 distinct YouTube channels
- Includes multi-regional and multi-dialectal Arabic speech variations