Biggest Ru Book Balalaika

Name: Biggest Ru Book Balalaika
Creator: MTUCI
Published: 2026-01-22T15:22:14
Keywords: Task Categoriestext To Speech, Librarypolars, Modalitytext, Size Categories100 Kn1 M, Arxiv250713563, Modalitytabular, Librarymlcroissant, Librarydatasets, Librarypandas, Parquet, Regionus, Task Categoriesautomatic Speech Recognition, Languageru, Licenseapache 20

by MTUCIUpdated 5mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

528.2 hours of filtered Russian speech data across the audiobook genre. The corpus is processed through the BALALAIKA pipeline by the MTUCI lab260 team for generative speech tasks.

Use Cases

Train text-to-speech (TTS) models using the Russian audio recordings and their corresponding text annotations
Develop speech synthesis systems optimized for the audiobook genre using the 528.2 hours of filtered speech
Fine-tune generative speech models using the high-quality Russian speech corpus

Strengths

528.2 hours of total audio duration after filtering from an initial 1000+ hours
Annotated using the BALALAIKA pipeline developed by the MTUCI lab260 team
Sourced exclusively from the Russian audiobook genre
Released under the apache-2.0 license

Parquet Task Categoriestext To Speech Librarypolars Modalitytext Size Categories100 Kn1 M Arxiv250713563 Modalitytabular Librarymlcroissant Librarydatasets Librarypandas Regionus Task Categoriesautomatic Speech Recognition Languageru Licenseapache 20

Related Datasets

Quality Score

C43

Description

48

Source

44

Reputation

43

Access

22

Community

30 downloads

3 likes

0 views

Dataset Info

Author: MTUCI
Created: Jan 22, 2026
Updated: Jan 29, 2026

Access

22

Community

30 downloads

3 likes

0 views

Dataset Info

Author: MTUCI
Created: Jan 22, 2026
Updated: Jan 29, 2026

Biggest Ru Book Balalaika

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info