Sign in to view source links and access this dataset
Description
A cleaned version of a Wolof text-to-speech dataset by GalsenAI. The dataset features a female voice that has been denoised and enhanced using the Resemble Enhance library, with annotations cleaned of special characters, emojis, and non-Latin scripts. The author notes that some lines and audios judged insufficiently qualitative have been removed, and annotation corrections are ongoing.
Use Cases
Train text-to-speech models for Wolof based on the described female voice recordings.
Benchmark audio denoising and enhancement algorithms using the processed audio files.
Study linguistic features of Wolof speech from the cleaned textual annotations.
Develop or fine-tune multilingual TTS systems that include West African languages.
Strengths
Audio has been specifically denoised and enhanced using the Resemble Enhance library.
Annotations have been cleaned by removing special characters, emojis, and Arabic and Russian characters.
Some annotation errors have been corrected, indicating a curation effort.
Limitations
The author states there are potentially many more annotation errors to correct.
Some lines and audios were removed for quality reasons, but the criteria are not specified.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset size are unknown, which may limit suitability assessment.
Provenance
Source
GalsenAI, derived from their original Wolof TTS dataset.
Collection Method
The female voice was extracted, denoised, and enhanced; annotations were cleaned.
Freshness
Last updated 2025-11-29 04:04:48; freshness should be verified.
Geography
Likely Senegal or Wolof-speaking regions, but not explicitly stated.
License is unknown; users must verify terms on the Hugging Face dataset page before use.