Name: Indic Dialect ASR Dataset with 2.8M+ Samples Across 30 Languages
Creator: grushaaaaa
Published: 2026-02-11T06:32:24
Keywords: Languagegrt, Languagedoi, Languageawa, Languagemai, Size Categories1 Mn10 M, Languagene, Languagebrx, Languagebho, Licensecc By 40, Parquet, Languagesd, Multilingual, Languageas, Audio, Audio Transcription, Languagekru, Task Categoriesautomatic Speech Recognition, Languagemwr, Languageor, Languagekok, Multilingual Audio, Languagemni, Automatic Speech Recognition, Languageks, Languagesat

Description

A multilingual automatic speech recognition dataset covering 30 Indic dialects and languages. It contains over 2.8 million audio samples with corresponding transcriptions. The dataset was created by author grushaaaaa and last updated on Hugging Face in February 2026.

Use Cases

Train multilingual automatic speech recognition models based on the audio and transcription features.
Benchmark ASR system performance across different Indic languages based on the language label.
Analyze dialectal variations in speech patterns based on the multilingual audio samples.
Fine-tune pre-trained speech models for specific languages based on the language-specific splits.

Strengths

Contains over 2.8 million audio samples.
Covers 30 distinct Indic languages and dialects.
Audio is provided in a standard 16kHz WAV format.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Hugging Face dataset by author grushaaaaa.
Collection Method: Aggregated from multiple source datasets, as indicated by the 'source' feature.
Time Range: null
Freshness: Last updated 2026-02-11 17:25:14; freshness should be verified.
Geography: Likely covers regions where Indic languages are spoken.

null

Indic Dialect ASR Dataset with 2.8M+ Samples Across 30 Languages

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info