ASR Code-Switch: 1,200 Code-Switching Utterances for Speech Recognition

Name: ASR Code-Switch: 1,200 Code-Switching Utterances for Speech Recognition
Creator: Perle-ai
Published: 2026-05-15T19:50:27
Keywords: Code Switching, Benchmark, Multilingual, Audio, Speech Recognition

by Perle-aiUpdated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

1,200 code-switching utterances form a curated benchmark for evaluating commercial Automatic Speech Recognition systems. The dataset, created by Perle-ai, includes 300 samples each for four language pairs, such as Egyptian Arabic–English. It was last updated on May 21, 2026.

Use Cases

Benchmarking ASR system performance on intra-sentential language switching based on the described utterances.
Training or fine-tuning speech models for multilingual environments based on the code-switching samples.
Analyzing error patterns in commercial ASR systems for specific language pairs like Egyptian Arabic–English.
Researching linguistic phenomena in code-switched speech based on the curated utterance collection.

Strengths

Contains 1,200 specifically curated code-switching utterances.
Provides a structured benchmark with 300 samples per language pair.
Focuses on intra-sentential language switching, a challenging scenario for ASR.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Perle-ai via Hugging Face.
Collection Method: Curated benchmark, likely gathered for research purposes as indicated by the linked arXiv paper.
Freshness: Last updated 2026-05-21 03:20:29; freshness should be verified.

License is unknown; terms of use must be verified before application.

Audio Multilingual Code Switching Benchmark Speech Recognition

Related Datasets

Quality Score

C40

Description

42

Source

39

Reputation

48

Access

26

Community

811 downloads

3 likes

0 views

Dataset Info

Author: Perle-ai
Created: May 15, 2026
Updated: May 21, 2026
Last synced: Jul 14, 2026

Access

26

Community

811 downloads

3 likes

0 views

Dataset Info

Author: Perle-ai
Created: May 15, 2026
Updated: May 21, 2026
Last synced: Jul 14, 2026

ASR Code-Switch: 1,200 Code-Switching Utterances for Speech Recognition

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info