Name: LibriHeavy TTS 3: A 50,000-Hour Speech Corpus for Text-to-Speech Training
Creator: brthor
Published: 2026-04-25T18:26:23
Keywords: Speech Synthesis, Speech Corpus, Text, Audio Text, Audio, Natural Language Processing, Libriheavy

Description

LibriHeavy TTS 3 is an improved version of the LibriHeavy dataset, designed specifically for text-to-speech training quality. It is built on a 50,000-hour labeled ASR corpus derived from LibriLight, with audio encoded using the Opus 68kbps codec. The dataset, authored by brthor and last updated in April 2026, focuses on providing better audio and text supervision quality.

Use Cases

Training text-to-speech models based on the high-fidelity audio and text supervision mentioned in the description.
Fine-tuning speech synthesis systems based on the large-scale, 50,000-hour labeled corpus.
Improving audio quality and naturalness in TTS outputs based on the use of the Opus codec for audio encoding.
Researching speech representation learning based on the audio-text alignment and punctuation/casing context provided.

Strengths

Contains 50,000 hours of audio data, providing a substantial resource for model training.
Audio files are encoded with the Opus 68kbps codec, which is designed to retain quality while reducing file size.
Includes punctuation, casing, and context labels for the text, which can improve supervision quality.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
The dataset is marked as a work in progress (WIP) and not yet released, indicating it may be incomplete or unstable.

Provenance

Source: Built on top of mythicinfinity/libriheavy, which is a labeled version of LibriLight derived from LibriVox recordings.
Collection Method: Likely involves processing and labeling audio from the LibriVox public domain audiobook project.
Time Range: null
Freshness: Last updated 2026-04-27 16:43:06; freshness should be verified.
Geography: null

The dataset is described as a work in progress (WIP) and not released yet, meaning it may not be available for download or use.

Text Audio Speech Synthesis Speech Corpus Audio Text Natural Language Processing Libriheavy

LibriHeavy TTS 3: A 50,000-Hour Speech Corpus for Text-to-Speech Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info