Ukrainian speech dataset for TTS and ASR tasks, processed from the Yehor/audiobooks-xxl source. The audio has been filtered for music and noise, resampled to 24 kHz, and transcribed using the nvidia/canary-1b-v2 model. The dataset was created by Mikhailo and last updated on April 29, 2026.
Use Cases
- Training Ukrainian text-to-speech models based on high-quality audiobook audio.
- Developing Ukrainian automatic speech recognition systems based on transcribed speech samples.
- Benchmarking audio processing pipelines based on the described music filtering and resampling steps.
- Creating synthetic speech datasets based on the 24 kHz mono audio format.
Strengths
- Audio samples have been filtered to remove background music and noise.
- Audio has been processed to a 24 kHz mono format, which is a common standard for speech tasks.
- Transcriptions were generated using the nvidia/canary-1b-v2 model.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Last updated 2026-04-29 09:29:22; freshness should be verified.
Provenance
- Source
- https://huggingface.co/datasets/Yehor/audiobooks-xxl
- Collection Method
- Processing pipeline includes MusicDetection filtering, audio resampling/conversion, and transcription.
- Freshness
- 2026-04-29 09:29:22
- Geography
- Ukraine (implied by language)