Common Voice 18 Arabic: Speech Corpus for Automatic Speech Recognition

Name: Common Voice 18 Arabic: Speech Corpus for Automatic Speech Recognition
Creator: MohamedRashad
Published: 2025-12-25T14:56:41
Keywords: Machine Learning, Arabic, Audio, Natural Language Processing, Speech Recognition

by MohamedRashadUpdated 7mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

An unofficial Arabic-only extraction of Mozilla Common Voice Corpus 18.0, prepared for Automatic Speech Recognition research. The dataset was created by MohamedRashad and last updated on 2025-12-27. It is derived from the original Common Voice 18 release, filtered to include only Arabic speech data while preserving the original dataset structure, splits, and metadata fields.

Use Cases

Training Arabic speech recognition models based on validated and unvalidated speech data.
Benchmarking ASR system performance on a dedicated Arabic speech corpus.
Developing language-specific acoustic models based on the preserved metadata fields.
Conducting research on speech data validation and filtering techniques for Arabic.

Strengths

Preserves the original Common Voice 18 dataset structure and metadata fields.
Contains both validated and unvalidated Arabic speech data segments.
Last updated on 2025-12-27, indicating recent maintenance.

Limitations

Row count, file formats, and license information are unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect geographic or demographic bias inherent to the original Common Voice collection.

Provenance

Source: Mozilla Common Voice Corpus 18.0
Collection Method: Filtered extraction to include only Arabic (ar) speech data.
Time Range: null
Freshness: 2025-12-27
Geography: null

License is unknown; restrictions should be verified before use.

Audio Arabic Machine Learning Natural Language Processing Speech Recognition

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

44

Access

26

Community

338 downloads

1 likes

0 views

Dataset Info

Author: MohamedRashad
Created: Dec 25, 2025
Updated: Dec 27, 2025
Last synced: Apr 20, 2026

Access

26

Community

338 downloads

1 likes

0 views

Dataset Info

Author: MohamedRashad
Created: Dec 25, 2025
Updated: Dec 27, 2025
Last synced: Apr 20, 2026

Common Voice 18 Arabic: Speech Corpus for Automatic Speech Recognition

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info