DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Uzbek Language Speech Recordings from Mozilla Common Voice | DataSalon

Home Speech & AudioUzbek Language Speech Recordings from Mozilla Common Voice

Speech & Audio

Uzbek Language Speech Recordings from Mozilla Common Voice

Name: Uzbek Language Speech Recordings from Mozilla Common Voice
Creator: yakhyo
Published: 2025-04-09T09:26:59
Keywords: Machine Learning, Audio Data, Uzbek Language, Audio, Speech Recognition

by yakhyo·Updated 1y ago

Available on 1 platform

Description

A refined subset of the Mozilla Common Voice corpus containing only Uzbek language voice recordings. The dataset has been cleaned and normalized, with a text field added, to improve usability for training automatic speech recognition models. It was created by user 'yakhyo' and last updated on April 15,我们发现了一个问题。

Use Cases

Training Uzbek-language ASR models based on the cleaned and normalized audio-text pairs.
Fine-tuning multilingual speech models for Uzbek based on the filtered language-specific samples.
Benchmarking speech recognition performance for Uzbek based on the preprocessed dataset structure.

Strengths

Focuses exclusively on the Uzbek language, providing a targeted resource.
Includes preprocessing steps such as text normalization, which may reduce data cleaning effort.

Limitations

Row count, file formats, and license information are unknown, limiting suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: Mozilla Common Voice project, specifically version 17.0.
Collection Method: Filtered and preprocessed from the larger multilingual corpus.
Time Range: null
Freshness: Last updated 2025-04-15 13:16:47.
Geography: null

null

Audio Machine Learning Audio Data Uzbek Language Speech Recognition

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

47 downloads

3 likes

0 views

Dataset Info

Author: yakhyo
Created: Apr 9, 2025
Updated: Apr 15, 2025
Last synced: Apr 21, 2026

Access

Community

47 downloads

3 likes

0 views

Dataset Info

Author: yakhyo
Created: Apr 9, 2025
Updated: Apr 15, 2025
Last synced: Apr 21, 2026

Uzbek Language Speech Recordings from Mozilla Common Voice

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info