DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Common Voice 25.0: Large-Scale Pashto Speech Data for ASR | DataSalon

Home Speech & AudioCommon Voice 25.0: Large-Scale Pashto Speech Data for ASR

Speech & Audio

Common Voice 25.0: Large-Scale Pashto Speech Data for ASR

Available on 1 platform

Description

Large-scale CC0 Pashto speech dataset for Automatic Speech Recognition (ASR). The dataset is part of the Common Voice project, version 25.0, and is hosted on Kaggle. Its specific collection method, size, and contributor details are not provided in the available metadata.

Use Cases

Training Pashto speech recognition models based on the large-scale speech data mentioned in the description
Fine-tuning multilingual ASR systems based on the inclusion of Pashto audio
Benchmarking ASR model performance on a low-resource language based on the Pashto language focus
Researching speech patterns and acoustic features in Pashto based on the audio data

Strengths

Released under a CC0 license, which allows for maximum reuse and redistribution
Explicitly designed for Automatic Speech Recognition (ASR) tasks
Focuses on Pashto, a language that may be underrepresented in other speech corpora

Limitations

Row count and total dataset size are unknown, which may limit suitability assessment
Column-level documentation is absent; field semantics must be inferred after download
Last update date is unknown; freshness unverified

Provenance

Source: Common Voice project, hosted on Kaggle.
Collection Method: Likely contains crowd-sourced speech recordings, but the specific collection method is not detailed.
Time Range: null
Freshness: Last updated date is unknown.
Geography: null

null

Text Audio Machine Learning Audio Data Cc0 License Large Scale Speech Recognition Pashto Language Automatic Speech Recognition

Related Datasets

Quality Score

D19

Description

Source

Reputation

Quality Score

D19

Description

Source

Reputation

Access

Community

0 views

Dataset Info

Last synced: Apr 25, 2026

Access

Community

0 views

Dataset Info

Last synced: Apr 25, 2026

Common Voice 25.0: Large-Scale Pashto Speech Data for ASR

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info