Description

LJ Speech contains 13,100 short audio clips of a single speaker reading from seven non-fiction books, totaling approximately 24 hours of English speech. Released by Keith Ito, the dataset provides expert-generated transcriptions for every recording to support speech synthesis and recognition tasks.

Use Cases

Training Text-to-Speech (TTS) models using the transcription field and audio clips
Fine-tuning Automatic Speech Recognition (ASR) systems on the 13,100 audio-text pairs
Acoustic modeling and prosody analysis of a consistent single speaker

Strengths

13,100 expert-annotated records
24 hours of high-quality single-speaker audio
Public domain license for unrestricted commercial and research use

Limitations

Single-speaker bias prevents models from generalizing to multiple voices or accents
Vocabulary is limited to the domain of 7 non-fiction books
Audio requires manual conversion to float32 for most modern ML pipelines

Provenance

Source: Keith Ito
Collection Method: Recorded readings of seven non-fiction books with expert-generated transcriptions
Freshness: Last updated August 2024; the source audio and text from non-fiction books are static.
Geography: United States

Audio is stored in .wav format and is not pre-converted to float32 arrays; users should use the soundfile library to map files to arrays before training. The dataset is released under the Unlicense (Public Domain).

Source Datasetsoriginal Size Categories10 Kn100 K Task Categoriestext To Speech Languageen Language Creatorsfound Licenseunlicense Task Categoriestext To Audio Regionus Task Categoriesautomatic Speech Recognition Multilingualitymonolingual Annotations Creatorsexpert Generated

LJ Speech: 13,100 Single-Speaker Audio Clips with Transcriptions

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info