Name: Hindi ASR Benchmark: Speech Recognition Performance Across Six Test Subsets
Creator: SkunkWorkLabs
Published: 2026-05-04T16:46:07
Keywords: Benchmark, Tabular, Hindi, Audio, Asr Evaluation, Speech Recognition

Description

A benchmark dataset created by SkunkWorkLabs, last updated in May 2026, for evaluating Hindi automatic speech recognition (ASR) systems. It compares the performance of the SkunkWorks model against commercial providers like ElevenLabs, Deepgram, and Sarvam. The evaluation is conducted across six distinct subsets sourced from projects like AI4Bharat Kathbath, Mozilla Common Voice, and Google FLEURS.

Use Cases

Benchmarking Hindi ASR model performance based on the comparison of multiple commercial providers.
Evaluating model robustness in noisy conditions based on the 'kathbath_noisy' subset.
Assessing model generalization across diverse data sources based on the six distinct evaluation subsets.
Conducting comparative analysis of open-source versus commercial ASR systems for Hindi.

Strengths

Evaluates performance across six distinct and named test subsets, including Kathbath, Common Voice, and MUCS.
Provides a direct comparison between a specific model (SkunkWorks) and three major commercial ASR providers.
Includes a subset specifically designed for noisy microphone conditions ('kathbath_noisy').

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown for most subsets, which may limit suitability assessment for large-scale training.
Description metadata is limited; actual data quality and audio file formats require manual inspection.

Provenance

Source: SkunkWorkLabs, aggregated from multiple sources including AI4Bharat, Mozilla, and Google.
Collection Method: Likely compiled from existing public speech datasets for benchmark creation.
Time Range: null
Freshness: Last updated 2026-05-04 16:46:18; freshness should be verified.
Geography: Primarily Hindi language data, likely focused on Indian contexts.

License information is unknown; terms of use for the aggregated sources must be verified.

Tabular Audio Hindi Benchmark Asr Evaluation Speech Recognition

Hindi ASR Benchmark: Speech Recognition Performance Across Six Test Subsets

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info