Description

CML-TTS is a multilingual Text-to-Speech dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias. It comprises audiobooks sourced from public domain books on Project Gutenberg, read by volunteers from the LibriVox project, and includes recordings in languages such as Dutch, German, French, Italian, and Polish. The dataset was last updated on the Hugging Face platform on 2023-11-24.

Use Cases

Training multilingual text-to-speech models based on the described audiobook recordings.
Developing speech synthesis systems for languages like Dutch, German, French, Italian, and Polish as listed in the description.
Researching prosody and speaker characteristics based on volunteer-read audiobook data.
Building speech datasets for low-resource languages using public domain literary content.

Strengths

Data is sourced from established public domain projects (Project Gutenberg and LibriVox).
Includes multiple languages, specifically mentioning Dutch, German, French, Italian, and Polish.
Developed by a named academic institution (CEIA at the Federal University of Goias).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and dataset size are unknown, which may limit suitability assessment.
Last updated 2023-11-24 14:48:29; freshness should be verified.

Provenance

Source: Center of Excellence in Artificial Intelligence (CEIA), Federal University of Goias (UFG).
Collection Method: Audiobooks sourced from Project Gutenberg public domain books, read by LibriVox volunteers.
Freshness: Last updated 2023-11-24 14:48:29.

License is unknown; users must verify terms before use.

Text Audio Multilingual Text To Speech Speech Synthesis Audiobooks Synthetic

CML-TTS: Multilingual Text-to-Speech Audiobooks from LibriVox

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info