Moore Speech Corpora provides aligned audio and text data for the Mooré language (ISO 639-3: mos), curated for low-resource speech processing. The dataset is cleaned and denoised to support text-to-speech and automatic speech recognition research. It was created by goaicorp and last updated in July 2025.
Use Cases
- Train a text-to-speech model using aligned audio and text data for the Mooré language.
- Develop an automatic speech recognition system for Mooré with cleaned and denoised audio samples.
- Fine-tune acoustic models on low-resource language data specifically for the ISO 639-3: mos language code.
Strengths
- Data is specifically cleaned and denoised for speech processing tasks.
- Focuses on the under-represented Mooré language, addressing a low-resource gap.
Limitations
- Specific scale metrics like row count, audio hours, or vocabulary size are unavailable.
- Dataset scope is limited to publicly available sources, which may restrict diversity and volume.
Provenance
- Source
- Publicly available sources, aggregated by goaicorp.
- Collection Method
- Gathered and unified from public sources, then cleaned and denoised.
- Time Range
- null
- Freshness
- Last updated on 2025-07 16.
- Geography
- null