DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Amharic Shewa Dialect Parallel Speech Corpus | DataSalon

Home Speech & AudioAmharic Shewa Dialect Parallel Speech Corpus

Speech & Audio

Amharic Shewa Dialect Parallel Speech Corpus

Name: Amharic Shewa Dialect Parallel Speech Corpus
Creator: leyu-amharic
Published: 2026-02-21T22:47:53
Keywords: Text To Speech, Amharic Language, Speech Corpus, Text, Dialectology, Audio, Natural Language Processing, Speech Recognition

by leyu-amharic·Updated 5mo ago

Available on 1 platform

Description

A parallel speech corpus contains audio recordings paired with text transcripts for the Shewa dialect of Amharic. The dataset, created by leyu-amharic, is designed for speech technology research and was last updated in February 2026.

Use Cases

Train ASR models to transcribe Shewa dialect audio recordings into text transcripts.
Develop TTS systems to synthesize speech from text, capturing dialect-specific phonetic and prosodic features.
Analyze dialect-specific phonetic variations and accent patterns by comparing audio features with transcript text.
Build speech synthesis models that preserve the accent patterns found in the Shewa dialect audio recordings.

Strengths

Curated parallel structure with audio recordings aligned to text transcripts.
Focus on the Shewa dialect captures specific phonetic and prosodic variations.

Limitations

Unknown sample size and recording duration limit assessment of model training suitability.
Potential geographic bias, as it only covers one Amharic dialect.

Provenance

Source: leyu-amharic on Hugging Face.
Collection Method: Curated collection of audio recordings paired with corresponding text transcripts.
Time Range: null
Freshness: Last updated in February 2026.
Geography: Shewa dialect region of Amharic language.

null

Text Audio Text To Speech Amharic Language Speech Corpus Dialectology Natural Language Processing Speech Recognition

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

200 downloads

1 likes

0 views

Dataset Info

Author: leyu-amharic
Created: Feb 21, 2026
Updated: Feb 22, 2026
Last synced: Jul 25, 2026

Access

Community

200 downloads

1 likes

0 views

Dataset Info

Author: leyu-amharic
Created: Feb 21, 2026
Updated: Feb 22, 2026
Last synced: Jul 25, 2026

Amharic Shewa Dialect Parallel Speech Corpus

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info