Supplying semantic and acoustic tokens for the LibriLight and LibriTTS English speech corpora, specifically formatted for training SPEAR TTS-like models. It features 24kHz EnCodec acoustic tokens at 6kbps and semantic tokens generated through a Whisper tiny VQ bottleneck trained on LibriLight subsets.
Use Cases
- Train a text-to-speech model by mapping input text to the provided semantic tokens.
- Synthesize high-fidelity audio by decoding the 24kHz EnCodec acoustic tokens.
- Benchmark SPEAR TTS-like models using the pre-tokenized LibriLight small, medium, and large subsets.
Strengths
- Includes 24kHz EnCodec acoustic tokens with 8 quantizers at a 6kbps bitrate.
- Features semantic tokens generated using a Whisper tiny VQ bottleneck trained on LibriLight subsets.
- Contains pre-processed data for the small, medium, and large subsets of the LibriLight corpus.
- Provides specialized tokenized representations for the English-only LibriTTS dataset.