10,000 Hindi utterances across six Vistaar-derived parts provide a benchmark for speech-to-text systems. The dataset contains about 15.5 hours of 16 kHz mono WAV audio, each with a reference transcript and outputs from four ASR services. It was published by RinggAI and last updated in April 2026.
Use Cases
- Benchmarking ASR model performance based on the six distinct Hindi audio sources mentioned in the description
- Comparing the accuracy of different commercial STT services (Ringg, ElevenLabs, Deepgram, Sarvam) based on provided transcripts
- Analyzing the impact of audio quality on transcription, based on the inclusion of a 'noisy' data part
- Training or fine-tuning Hindi speech recognition models using the provided reference transcripts
Strengths
- 10,000 utterances provide a substantial sample size for evaluation
- About 15.5 hours of audio offers significant temporal coverage for testing
- Includes raw and normalized transcripts from four different ASR services for comparison
- Derived from six distinct sources, suggesting diversity in speech content and recording conditions
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- RinggAI via Hugging Face
- Collection Method
- Packaged from six Vistaar-derived parts: IndicTTS, FLEURS, CommonVoice, Kathbath, Kathbath noisy, and MUCS
- Time Range
- null
- Freshness
- Last updated 2026-04-30 14:45:24; freshness should be verified
- Geography
- null