Sign in to view source links and access this dataset
Description
Tamazight speech segments, specifically in the Tachelhit dialect, are paired with Modern Standard Arabic transcriptions. The dataset is actively growing with regular updates, as noted on its Hugging Face page. Author SoufianeDahimi last updated the dataset on March 15, 2025.
Use Cases
Train automatic speech recognition models based on Tamazight audio paired with Arabic text.
Develop speech-to-text translation systems based on the Tamazight-to-Arabic transcription pairs.
Benchmark ASR model performance for the Tachelhit dialect of Tamazight.
Create language resources for a low-resource language based on the described speech segments.
Strengths
Focuses on the Tachelhit dialect of Tamazight, a specific and likely underrepresented language variant.
Designed for a concrete task: automatic speech recognition for translation into Modern Standard Arabic.
Last updated on 2025-03-15, indicating recent maintenance.
Limitations
Row count, file formats, and column-level documentation are unknown, which may limit suitability assessment.
License information is unknown, which could restrict commercial or research use.
Data may reflect dialectal or collection bias inherent to the specific Tachelhit focus.
Provenance
Source
huggingface
Collection Method
Likely contains manually or semi-automatically transcribed speech segments.
Time Range
null
Freshness
Last updated 2025-03-15 18:32:40; freshness should be verified.
Geography
Likely contains data relevant to Tamazight (Berber) language speakers, particularly those of the Tachelhit dialect.
License restrictions are unknown; users must verify terms before use.