Description

585 hours of 24kHz English speech audio form this multi-speaker corpus derived from LibriVox audiobooks and Project Gutenberg texts. Heiga Zen and Google Speech/Brain team members prepared the dataset specifically for TTS research. The dataset card was last updated in February 2024.

Use Cases

Train TTS models to generate speech waveforms from text transcripts using the aligned audio-text pairs.
Develop multi-speaker synthesis systems by leveraging the dataset's numerous speaker identities and corresponding audio samples.
Benchmark prosody and expressiveness in synthesized speech against the natural, read-aloud audio recordings.
Pre-train or fine-tune acoustic models on the 24kHz high-fidelity audio samples for improved sound quality.

Strengths

Approximately 585 hours of audio data provides substantial material for model training.
High-quality 24kHz sampling rate offers good fidelity for speech synthesis tasks.
Derived from the established LibriSpeech corpus, suggesting a foundation of reliable source material.

Limitations

Limited to read English speech, which may not capture conversational or spontaneous speaking styles.
Potential bias towards literary content and specific speaker demographics present in LibriVox recordings.
Specific details on speaker count, demographic balance, and audio length distribution are not provided in the input.

Provenance

Source: Derived from the LibriSpeech corpus, which uses audio from LibriVox and text from Project Gutenberg.
Collection Method: Audio files (mp3) and text files were adapted and prepared for TTS research.
Freshness: Dataset card updated 2024-02-09; underlying audio source material is older.

License is listed as 'cc By 40' on the platform but specific terms should be verified on the original dataset page. The dataset is designed for TTS, not automatic speech recognition.

Text Audio Parquet Text To Speech Task Categoriestext To Speech Librarypolars Librarydask Languageen Speech Synthesis Modalitytext Size Categories100 Kn1 M Multi Speaker Librarymlcroissant Librarydatasets Licensecc By 40 Regionus Natural Language Processing Audio Corpus Arxiv190402882

LibriTTS English Speech Corpus for Text-to-Speech Research

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info