Sign in to view source links and access this dataset
Description
A domain-specific Pashto automatic speech recognition dataset covering agriculture, general topics, food services, health, and services. The dataset is structured by domain with audio files and corresponding transcript CSV files, created by Sabtain-Dev and last updated on June 5, 2026.
Use Cases
Train domain-specific Pashto speech recognition models based on the agriculture, health, and services audio domains.
Benchmark ASR model performance on Pashto across different conversational topics mentioned in the description.
Create synthetic Pashto speech data for underrepresented domains like food services.
Study acoustic characteristics of Pashto speech in professional and general contexts.
Strengths
Covers five distinct spoken domains: agriculture, general, food services, health, and services.
Provides a clear mapping structure where audio files correspond to rows in a transcript CSV.
Includes multiple audio formats, as indicated in the description.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Row count, column details, and license information are unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Provenance
Source
huggingface
Collection Method
Likely collected and curated for ASR model training.
Time Range
null
Freshness
Last updated 2026-06-05 15:08:42; freshness should be verified.
Geography
null
License is unknown; users must verify permissions before use.