Name: Hindi STT Benchmarking Dataset with 10,000 Utterances from Six Sources
Creator: RinggAI
Published: 2026-04-30T10:55:25
Keywords: Benchmarking, Tabular, Hindi, Audio, Speech Recognition

Description

10,000 Hindi utterances across six Vistaar-derived parts provide a benchmark for speech-to-text systems. The dataset contains about 15.5 hours of 16 kHz mono WAV audio, each with a reference transcript and outputs from four ASR services. It was published by RinggAI and last updated in April 2026.

Use Cases

Benchmarking ASR model performance based on the six distinct Hindi audio sources mentioned in the description
Comparing the accuracy of different commercial STT services (Ringg, ElevenLabs, Deepgram, Sarvam) based on provided transcripts
Analyzing the impact of audio quality on transcription, based on the inclusion of a 'noisy' data part
Training or fine-tuning Hindi speech recognition models using the provided reference transcripts

Strengths

10,000 utterances provide a substantial sample size for evaluation
About 15.5 hours of audio offers significant temporal coverage for testing
Includes raw and normalized transcripts from four different ASR services for comparison
Derived from six distinct sources, suggesting diversity in speech content and recording conditions

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: RinggAI via Hugging Face
Collection Method: Packaged from six Vistaar-derived parts: IndicTTS, FLEURS, CommonVoice, Kathbath, Kathbath noisy, and MUCS
Time Range: null
Freshness: Last updated 2026-04-30 14:45:24; freshness should be verified
Geography: null

null

Tabular Audio Hindi Benchmarking Speech Recognition

Hindi STT Benchmarking Dataset with 10,000 Utterances from Six Sources

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info