Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MTUCI's lab260 team released this Russian speech corpus in early 2026, containing between 100,000 and 1,000,000 records. The dataset consists of audiobook recordings filtered and annotated using the BALALAIKA pipeline to support advanced generative speech tasks.
The dataset is provided in Parquet format and is compatible with the Polars and Dask libraries for large-scale processing.