Sign in to view source links and access this dataset
Description
CML-TTS is a multilingual Text-to-Speech dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias. It comprises audiobooks sourced from public domain books on Project Gutenberg, read by volunteers from the LibriVox project, and includes recordings in languages such as Dutch, German, French, Italian, and Polish. The dataset was last updated on the Hugging Face platform on 2023-11-24.
Use Cases
Training multilingual text-to-speech models based on the described audiobook recordings.
Developing speech synthesis systems for languages like Dutch, German, French, Italian, and Polish as listed in the description.
Researching prosody and speaker characteristics based on volunteer-read audiobook data.
Building speech datasets for low-resource languages using public domain literary content.
Strengths
Data is sourced from established public domain projects (Project Gutenberg and LibriVox).
Includes multiple languages, specifically mentioning Dutch, German, French, Italian, and Polish.
Developed by a named academic institution (CEIA at the Federal University of Goias).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and dataset size are unknown, which may limit suitability assessment.
Last updated 2023-11-24 14:48:29; freshness should be verified.
Provenance
Source
Center of Excellence in Artificial Intelligence (CEIA), Federal University of Goias (UFG).
Collection Method
Audiobooks sourced from Project Gutenberg public domain books, read by LibriVox volunteers.
Freshness
Last updated 2023-11-24 14:48:29.
License is unknown; users must verify terms before use.