DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Svarah: Indic Accented English Speech Benchmark | DataSalon

Home Speech & AudioSvarah: Indic Accented English Speech Benchmark

Speech & Audio

Svarah: Indic Accented English Speech Benchmark

Name: Svarah: Indic Accented English Speech Benchmark
Creator: ai4bharat
Published: 2025-03-05T11:04:23
Keywords: Benchmark, Audio, Large Scale, Audio Transcription, Speech Recognition, Indic English

by ai4bharat·Updated 1y ago

Available on 1 platform

Description

March 2025, the Svarah dataset provides 9.6 hours of transcribed English audio from 117 speakers across India. It addresses the underrepresentation of Indian English speakers in existing benchmarks like LibriSpeech and Switchboard. The dataset was created by ai4bharat.

Use Cases

Benchmarking automatic speech recognition (ASR) systems based on Indic-accented English audio
Training accent-robust speech models based on a speaker base of roughly 130 million
Studying phonetic variations in English speech based on data from 117 speakers
Evaluating model performance on underrepresented accents based on the described gap

Strengths

Contains 9.6 hours of transcribed audio
Includes data from 117 speakers
Specifically addresses a gap in representation for Indian English speakers

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: ai4bharat
Freshness: Last updated 2025-03-10 04:29:23
Geography: India

Audio Benchmark Large Scale Audio Transcription Speech Recognition Indic English

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

1.1K downloads

24 likes

0 views

Dataset Info

Author: ai4bharat
Created: Mar 5, 2025
Updated: Mar 10, 2025
Last synced: Jun 8, 2026

Access

Community

1.1K downloads

24 likes

0 views

Dataset Info

Author: ai4bharat
Created: Mar 5, 2025
Updated: Mar 10, 2025
Last synced: Jun 8, 2026

Svarah: Indic Accented English Speech Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info