CoSHE-Eval: Hindi-English Code-Switching Speech Recognition Benchmark

Name: CoSHE-Eval: Hindi-English Code-Switching Speech Recognition Benchmark
Creator: soketlabs
Published: 2025-11-04T10:06:37
Keywords: Asr Benchmark, Code Switching, Benchmark, Audio, Hindi English, Speech Recognition

by soketlabsUpdated 6mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

India-focused conversational speech data for evaluating Automatic Speech Recognition systems on Hindi-English code-mixed utterances. The dataset was curated by soketlabs and last updated on the Hugging Face platform in January 2026. It focuses on natural bilingual contexts where Hindi in Devanagari script and English in Latin script co-occur within the same utterance.

Use Cases

Benchmarking ASR model performance on Hindi-English code-switching based on the described bilingual conversational contexts.
Training or fine-tuning speech recognition systems for natural code-mixed speech prevalent in India.
Studying linguistic patterns and challenges in bilingual speech recognition based on the described Devanagari and Latin script mixing.

Strengths

Focuses on a specific and linguistically relevant phenomenon: Hindi-English code-switching in Indian conversational contexts.
Dataset is hosted on Hugging Face, a major platform for AI datasets, suggesting potential for community use and integration.

Limitations

Description metadata is limited; actual data quality, size, and column structure require manual inspection after download.
Row count, file formats, and license information are unknown, which may limit suitability assessment.

Provenance

Source: soketlabs on Hugging Face
Collection Method: Curated evaluation dataset, likely gathered from bilingual conversational contexts.
Time Range: null
Freshness: Last updated 2026-01-16 11:57:16; freshness should be verified.
Geography: India

null

Audio Asr Benchmark Code Switching Benchmark Hindi English Speech Recognition

Related Datasets

Quality Score

D40

Description

42

Source

39

Reputation

46

Access

26

Community

81 downloads

7 likes

0 views

Dataset Info

Author: soketlabs
Created: Nov 4, 2025
Updated: Jan 16, 2026
Last synced: Jul 25, 2026

Access

26

Community

81 downloads

7 likes

0 views

Dataset Info

Author: soketlabs
Created: Nov 4, 2025
Updated: Jan 16, 2026
Last synced: Jul 25, 2026

CoSHE-Eval: Hindi-English Code-Switching Speech Recognition Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info