Description

Common Voice Corpus 11.0 is a multilingual speech dataset consisting of MP3 audio files paired with corresponding text transcriptions. The dataset contains 24,210 recorded hours, with 16,413 validated hours across 100 languages. Many recordings include demographic metadata such as age, sex, and accent.

Use Cases

Training automatic speech recognition (ASR) models based on the large volume of validated audio-text pairs.
Improving ASR model accuracy for specific demographic groups based on the included age, sex, and accent metadata.
Benchmarking multilingual speech recognition performance across the 100 supported languages.
Studying acoustic variations and model bias related to speaker demographics mentioned in the description.

Strengths

Large scale with 24,210 recorded hours of speech data.
Includes 16,413 hours of validated data, suggesting a quality control process.
Covers a wide range of 100 languages.
Contains demographic metadata like age, sex, and accent for many recordings.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
The total number of unique speakers and the distribution of recordings per language are unknown.
Data may reflect geographic or linguistic bias inherent to the contributor base of a crowdsourced platform.

Provenance

Source: Mozilla Common Voice project, hosted by user 'echodict' on Hugging Face.
Collection Method: Crowdsourced contributions from volunteers.
Freshness: Last updated 2026-04-16 07:27:52; freshness should be verified.
Geography: Global, based on the 100 languages covered.

License is unknown and must be verified before use for commercial or redistribution purposes.

Tabular Audio Multilingual Natural Language Processing Demographics Audio Corpus Speech Recognition

Common Voice 11.0: Multilingual Speech Corpus with Demographic Metadata

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info