DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Cleaned Audio-Text Corpus for Mooré Speech Processing | DataSalon

Home Speech & AudioCleaned Audio-Text Corpus for Mooré Speech Processing

Speech & Audio

Cleaned Audio-Text Corpus for Mooré Speech Processing

Name: Cleaned Audio-Text Corpus for Mooré Speech Processing
Creator: goaicorp
Published: 2025-07-16T17:23:24
Keywords: Size Categories1 Kn10 K, Task Categoriestext To Speech, Librarypolars, Librarydask, OPTIMIZED-PARQUET, Modalitytext, Librarymlcroissant, Librarydatasets, Parquet, Licensecc By Nc 40, Regionus, Task Categoriesautomatic Speech Recognition, Languagemos

by goaicorp·Updated 3mo ago

Available on 1 platform

Description

Moore Speech Corpora provides aligned audio and text data for the Mooré language (ISO 639-3: mos), curated for low-resource speech processing. The dataset is cleaned and denoised to support text-to-speech and automatic speech recognition research. It was created by goaicorp and last updated in July 2025.

Use Cases

Train a text-to-speech model using aligned audio and text data for the Mooré language.
Develop an automatic speech recognition system for Mooré with cleaned and denoised audio samples.
Fine-tune acoustic models on low-resource language data specifically for the ISO 639-3: mos language code.

Strengths

Data is specifically cleaned and denoised for speech processing tasks.
Focuses on the under-represented Mooré language, addressing a low-resource gap.

Limitations

Specific scale metrics like row count, audio hours, or vocabulary size are unavailable.
Dataset scope is limited to publicly available sources, which may restrict diversity and volume.

Provenance

Source: Publicly available sources, aggregated by goaicorp.
Collection Method: Gathered and unified from public sources, then cleaned and denoised.
Time Range: null
Freshness: Last updated on 2025-07 16.
Geography: null

null

OPTIMIZED-PARQUET Parquet Size Categories1 Kn10 K Task Categoriestext To Speech Librarypolars Librarydask Modalitytext Librarymlcroissant Librarydatasets Licensecc By Nc 40 Regionus Task Categoriesautomatic Speech Recognition Languagemos

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

115 downloads

3 likes

0 views

Dataset Info

Author: goaicorp
Created: Jul 16, 2025
Updated: Apr 3, 2026
Last synced: Apr 9, 2026

Access

Community

115 downloads

3 likes

0 views

Dataset Info

Author: goaicorp
Created: Jul 16, 2025
Updated: Apr 3, 2026
Last synced: Apr 9, 2026

Cleaned Audio-Text Corpus for Mooré Speech Processing

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info