Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
somu9's mls_eng_tokens dataset provides pre-extracted audio codec tokens from the Multilingual LibriSpeech English corpus, tokenized using MOSS-Audio-Tokenizer. The dataset includes train, dev, and test splits and was last updated on 2026-05-17. Audio is processed at a 48,000 Hz sample rate and a 12.5 Hz frame rate.
License is unknown; users must verify permissions before use.