Sign in to view source links and access this dataset
Description
1,200 code-switching utterances form a curated benchmark for evaluating commercial Automatic Speech Recognition systems. The dataset, created by Perle-ai, includes 300 samples each for four language pairs, such as Egyptian Arabic–English. It was last updated on May 21, 2026.
Use Cases
Benchmarking ASR system performance on intra-sentential language switching based on the described utterances.
Training or fine-tuning speech models for multilingual environments based on the code-switching samples.
Analyzing error patterns in commercial ASR systems for specific language pairs like Egyptian Arabic–English.
Researching linguistic phenomena in code-switched speech based on the curated utterance collection.