Name: WolneLektury-TTS-Polish: 997 Hours of Polish Audiobook Speech
Creator: datadriven-company
Published: 2026-02-03T07:09:52
Keywords: Speech Synthesis, Audiobooks, Audio, Large Scale, Polish Language, Automatic Speech Recognition

Description

383,710 audio samples totaling 997 hours of Polish speech, derived from the public domain Wolne Lektury digital library. The dataset features recordings from 1,207 unique professional narrators, with a split of 294,756 male and 88,945 female samples. It was created by datadriven-company and last updated on Hugging Face in February 2026.

Use Cases

Training text-to-speech models based on high-quality, professionally narrated Polish audio.
Developing automatic speech recognition systems for Polish based on a large corpus of transcribed speech.
Conducting speaker or voice characteristic analysis based on the 1,207 unique narrators.
Studying prosody and intonation patterns in formal, literary Polish speech.
Fine-tuning speech models for gender-specific voice synthesis based on the male and female sample split.

Strengths

Large scale with 383,710 samples and 997 hours of audio.
High-quality source material from professional voice actors in public domain audiobooks.
Diverse speaker set with 1,207 unique narrators.
Clear gender distribution with 294,756 male and 88,945 female samples.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality and file formats require manual inspection after download.

Provenance

Source: Wolne Lektury (Free Readings), a Polish digital library.
Collection Method: Derived from public domain audiobooks featuring professional voice actors.
Time Range: null
Freshness: Last updated 2026-02-04 21:42:19; freshness should be verified.
Geography: null

null

Audio Speech Synthesis Audiobooks Large Scale Polish Language Automatic Speech Recognition

WolneLektury-TTS-Polish: 997 Hours of Polish Audiobook Speech

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info