Name: KasaSpeech: English-Twi Code-Switching Speech Recordings from Ghana
Creator: Kennethdot
Published: 2026-05-02T14:30:49
Keywords: Code Switching, Audio, Large Scale, African Languages, Speech Recognition, Twi, Low Resource Languages

Description

Ghana is the primary source for KasaSpeech, a large-scale speech dataset featuring natural switching between English and Twi. It contains 49,878 transcribed audio samples, split into training, validation, and test sets. The dataset was created by Kennethdot and last updated on Hugging Face in May 2026.

Use Cases

Train automatic speech recognition models based on transcribed English-Twi code-switching speech.
Develop language identification models based on segments of mixed English and Twi audio.
Research linguistic patterns and sociolinguistics of code-switching based on natural speech from Ghanaian speakers.
Benchmark speech AI systems for low-resource language performance based on the provided train/test splits.

Strengths

Contains 49,878 total speech samples, providing a substantial corpus for model training.
Includes a structured split of 48,292 training, 581 validation, and 1,005 test samples for evaluation.
Focuses on natural code-switching, a feature mentioned as key for advancing speech AI for low-resource languages.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for the primary data table is unknown, which may limit suitability assessment.
The description metadata is limited; actual audio quality and speaker diversity require manual inspection.

Provenance

Source: Hugging Face user Kennethdot.
Collection Method: Likely contains speech recordings collected from diverse speakers, as mentioned in the description.
Freshness: Last updated 2026-05-21 10:41:20; freshness should be verified.
Geography: Primarily Ghana, as stated in the description.

License is unknown; terms of use must be verified before application.

Audio Code Switching Large Scale African Languages Speech Recognition Twi Low Resource Languages

KasaSpeech: English-Twi Code-Switching Speech Recordings from Ghana

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info