13,100 short audio clips and corresponding transcriptions featuring a single speaker reading from 7 non-fiction books. The dataset totals approximately 24 hours of audio with individual clip durations ranging from 1 to 10 seconds.
Use Cases
- Train neural text-to-speech (TTS) models using the paired audio clips and transcriptions
- Benchmark automated speech recognition (ASR) systems on single-speaker clarity using the 1-10 second segments
- Analyze prosody and intonation patterns across 24 hours of non-fiction book readings
Strengths
- 13,100 individual audio clips with a total duration of approximately 24 hours
- Audio segments strictly constrained to lengths between 1 and 10 seconds
- Verbatim transcriptions provided for every audio segment in the collection