Name: Common Voice 20.0: Mongolian Speech Data with Transcriptions and Demographics
Creator: warmestman
Published: 2025-03-05T11:35:09
Keywords: Audio Data, Tabular, Audio, Demographics, Speech Recognition, Mongolian Language

Description

Common Voice 20.0 Mongolian Dataset is a subset of Mozilla's Common Voice project containing Mongolian speech data. The dataset includes audio clips in .mp3 format, transcriptions, train/test/dev splits, and metadata such as speaker demographics. It was uploaded by user 'warmestman' to Hugging Face on March 5, 2025.

Use Cases

Train speech recognition models based on the provided audio clips and transcriptions.
Conduct voice analysis based on the audio data and speaker demographic metadata.
Perform linguistic research on the Mongolian language based on the transcribed speech corpus.
Benchmark speech processing systems using the predefined train/test/dev splits.

Strengths

Includes structured train/test/dev splits, which supports machine learning workflows.
Contains additional metadata such as speaker demographics, which can enable bias analysis.
Part of the established Mozilla Common Voice project, suggesting a standardized collection method.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and total size are unknown, which may limit suitability assessment.

Provenance

Source: Mozilla Common Voice project
Collection Method: Crowdsourced collection, likely via the Common Voice platform.
Time Range: Part of the Common Voice 20.0 release.
Freshness: Last updated 2025-03 05 16:27:27; freshness should be verified.
Geography: Mongolian language focus.

License is unknown; users must verify licensing terms before use.

Tabular Audio Audio Data Demographics Speech Recognition Mongolian Language

Common Voice 20.0: Mongolian Speech Data with Transcriptions and Demographics

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info