Sign in to view source links and access this dataset
Description
A domain-specific, multilingual agricultural speech dataset with a primary focus on Hindi, Telugu, and Odia. It features human-annotated transcriptions and is intended for benchmarking ASR model performance in real-world agricultural scenarios, created by DigiGreen. The dataset page was last updated on 2026-04-15.
Use Cases
Benchmarking ASR model performance based on real-world agricultural audio
Training domain-specific speech-to-text models based on agricultural advisory content
Evaluating multilingual ASR systems based on the Hindi, Telugu, and Odia languages mentioned
Improving speech recognition accuracy in noisy, real-world field scenarios as described
Strengths
Focuses on three specific Indian languages: Hindi, Telugu, and Odia
Contains human-annotated transcriptions for accuracy
Designed for real-world agricultural advisory scenarios
Benchmarks 10 ASR models using 10,934 audio samples
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count and total dataset size are unknown, which may limit suitability assessment
Freshness should be verified as the last update date is in the future (2026-04-15)
Provenance
Source
DigiGreen
Collection Method
Likely collected from real-world agricultural advisory interactions.
Freshness
Last updated 2026-04-15 10:00:22
Geography
Primarily India, based on the focus on Hindi, Telugu, and Odia languages.
License is unknown; check the dataset page for usage restrictions.