DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Common Voice ASR Clean: Filtered Speech Recognition Samples | DataSalon

Home Speech & AudioCommon Voice ASR Clean: Filtered Speech Recognition Samples

Speech & Audio

Common Voice ASR Clean: Filtered Speech Recognition Samples

Name: Common Voice ASR Clean: Filtered Speech Recognition Samples
Creator: OpenSpeechHub
Published: 2026-03-31T17:16:38
Keywords: Audio, Audio Processing, Speech Recognition, Automatic Speech Recognition, Filtered Dataset

by OpenSpeechHub·Updated 3mo ago

Available on 1 platform

Description

A filtered version of the Common Voice dataset for automatic speech recognition (ASR). Samples with fewer than three words, repetitive tokens, or chat token leaks have been removed. The dataset was created by OpenSpeechHub and was last updated on March 31, 2026.

Use Cases

Train automatic speech recognition models based on filtered audio-text pairs.
Benchmark ASR model performance on a dataset with removed short and repetitive samples.
Fine-tune language models for speech tasks using cleaned transcriptions.

Strengths

Applies specific filtering criteria: removes samples with fewer than three words, repetitive tokens, or chat token leaks.
Last updated on March 31, 2026, suggesting recent maintenance.

Limitations

Row count, file formats, and column-level documentation are unknown, limiting suitability assessment.
License information is unavailable, which is critical for determining permissible use.

Provenance

Source: OpenSpeechHub, likely derived from the Mozilla Common Voice project.
Collection Method: Filtered from a source ASR dataset by removing samples meeting specific criteria.
Time Range: null
Freshness: Last updated 2026-03-31 18:13:56
Geography: null

License is unknown; users must verify permissible use before downloading.

Audio Audio Processing Speech Recognition Automatic Speech Recognition Filtered Dataset

Related Datasets

Quality Score

D31

Description

Source

Reputation

Quality Score

D31

Description

Source

Reputation

Access

Community

291 downloads

1 likes

0 views

Dataset Info

Author: OpenSpeechHub
Created: Mar 31, 2026
Updated: Mar 31, 2026
Last synced: Jun 9, 2026

Access

Community

291 downloads

1 likes

0 views

Dataset Info

Author: OpenSpeechHub
Created: Mar 31, 2026
Updated: Mar 31, 2026
Last synced: Jun 9, 2026

Common Voice ASR Clean: Filtered Speech Recognition Samples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info