Name: Horn ASR Benchmark: Multilingual Speech Recognition for Horn of Africa Languages
Creator: LesanAI
Published: 2026-05-07T10:46:20
Keywords: Benchmark, Multilingual, Audio, Horn Of Africa, Automatic Speech Recognition

Description

A multilingual evaluation benchmark for automatic speech recognition covering four under-served languages of the Horn of Africa: Amharic, Oromo, Somali, and Tigrinya. It contains 4,000 utterances totaling 15.44 hours of audio, drawn from spontaneous interview-style speech with transcripts validated by native speakers. The dataset was created by LesanAI and last updated on May 7, 2026.

Use Cases

Benchmarking ASR model performance based on the 1,000 evaluation utterances per language
Evaluating multilingual speech recognition systems based on the coverage of Amharic, Oromo, Somali, and Tigrinya
Training or fine-tuning ASR models based on spontaneous interview-style speech data
Studying linguistic features of Horn of Africa languages based on native-speaker validated transcripts

Strengths

Contains 4,000 evaluation utterances across four languages
Provides 15.44 hours of audio data
Transcripts are post-edited and QC-validated by native-speaker annotators
Audio is sourced from spontaneous interview-style speech

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Data may reflect geographic or source bias inherent to the interview collection method

Provenance

Source: LesanAI
Collection Method: Utterances drawn from spontaneous interview-style speech, with transcripts post-edited and QC-validated by native-speaker annotators.
Freshness: Last updated 2026-05-07 11:12:02; freshness should be verified
Geography: Horn of Africa (Amharic, Oromo, Somali, Tigrinya)

Audio Multilingual Benchmark Horn Of Africa Automatic Speech Recognition

Horn ASR Benchmark: Multilingual Speech Recognition for Horn of Africa Languages

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info