Name: Kazakh Songs ASR: 1,000-10,000 Manually Aligned Audio-Text Pairs
Creator: yeshpanovrustem
Published: 2026-02-21T05:03:06
Keywords: Size Categories1 Kn10 K, Licenseother, Librarypolars, Librarydask, Modalityaudio, Modalitytext, Librarymlcroissant, Librarydatasets, Parquet, Regionus, Task Categoriesautomatic Speech Recognition, Arxiv260300961, Languagekk

Description

Aggregating between 1,000 and 10,000 manually aligned audio-text pairs from Kazakh commercial songs, released by yeshpanovrustem in 2026. It provides line-level vocal segments designed to investigate the utility of sung speech for low-resource automatic speech recognition (ASR) systems.

Use Cases

Training ASR models using line-level vocal segments to map melodic audio to text transcriptions
Benchmarking the robustness of Kazakh speech-to-text systems against rhythmic and melodic vocal inputs
Researching transfer learning capabilities from sung speech to spoken language models in low-resource contexts

Strengths

Manually verified line-level alignments between audio and text
Contains between 1,000 and 10,000 records for a low-resource language
Sourced from high-quality commercial song recordings

Limitations

Small sample size of under 10,000 records compared to standard ASR corpora
Acoustic bias toward sung speech and musical accompaniment which may not generalize to spoken Kazakh
Lack of demographic diversity data for the vocalists

Provenance

Source: yeshpanovrustem (Arxiv: 2603.00961)
Collection Method: Manual alignment of audio-text pairs from commercially released Kazakh songs
Freshness: Last updated March 2026.
Geography: Kazakhstan

The dataset is provided in Parquet format and is compatible with Polars, Dask, and the Hugging Face datasets library. Users should be aware of potential license restrictions related to the use of commercial music recordings.

Parquet Size Categories1 Kn10 K Licenseother Librarypolars Librarydask Modalityaudio Modalitytext Librarymlcroissant Librarydatasets Regionus Task Categoriesautomatic Speech Recognition Arxiv260300961 Languagekk

Kazakh Songs ASR: 1,000-10,000 Manually Aligned Audio-Text Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info