Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Between 100,000 and 1,000,000 Spanish audio segments and transcriptions derived from LibriVox audiobooks. Created by Cnam-LMSSC and updated in March 2026, it extends the Multilingual LibriSpeech (MLS) corpus with machine-generated phonetic transcriptions.
The dataset is licensed under CC BY 4.0 and is provided in Parquet format, compatible with Polars, Dask, and Hugging Face Datasets libraries.