MASC: Massive Arabic Speech Corpus

Name: MASC: Massive Arabic Speech Corpus
Creator: abdusah
Published: 2022-03-02T23:29:22
Keywords: Languagear, Language Creatorscrowdsourced, Licensecc By Nc 40, Annotations Creatorscrowdsourced, Regionus

by abdusahUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

1,000 hours of Arabic speech audio sampled at 16 kHz, sourced from over 700 YouTube channels. The collection spans multiple regions, genres, and dialects to support the development of speech recognition technologies.

Use Cases

Train Automatic Speech Recognition (ASR) models using the 1,000 hours of multi-dialectal audio
Develop dialect identification systems by leveraging the multi-regional and multi-dialect nature of the speech samples
Fine-tune acoustic models for 16 kHz audio processing in diverse acoustic environments

Strengths

1,000 hours of speech audio sampled at 16 kHz
Data sourced from over 700 distinct YouTube channels
Covers multi-regional, multi-genre, and multi-dialect Arabic speech

Languagear Language Creatorscrowdsourced Licensecc By Nc 40 Annotations Creatorscrowdsourced Regionus

Related Datasets

Quality Score

D34

Description

45

Source

36

Reputation

16

Access

22

Community

25 downloads

0 views

Dataset Info

Author: abdusah
Created: Mar 2, 2022
Updated: Nov 16, 2023
Last synced: Apr 29, 2026

Access

22

Community

25 downloads

0 views

Dataset Info

Author: abdusah
Created: Mar 2, 2022
Updated: Nov 16, 2023
Last synced: Apr 29, 2026

MASC: Massive Arabic Speech Corpus

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info