383,710 audio samples totaling 997 hours of Polish speech, derived from the public domain Wolne Lektury digital library. The dataset features recordings from 1,207 unique professional narrators, with a split of 294,756 male and 88,945 female samples. It was created by datadriven-company and last updated on Hugging Face in February 2026.
Use Cases
- Training text-to-speech models based on high-quality, professionally narrated Polish audio.
- Developing automatic speech recognition systems for Polish based on a large corpus of transcribed speech.
- Conducting speaker or voice characteristic analysis based on the 1,207 unique narrators.
- Studying prosody and intonation patterns in formal, literary Polish speech.
- Fine-tuning speech models for gender-specific voice synthesis based on the male and female sample split.
Strengths
- Large scale with 383,710 samples and 997 hours of audio.
- High-quality source material from professional voice actors in public domain audiobooks.
- Diverse speaker set with 1,207 unique narrators.
- Clear gender distribution with 294,756 male and 88,945 female samples.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality and file formats require manual inspection after download.
Provenance
- Source
- Wolne Lektury (Free Readings), a Polish digital library.
- Collection Method
- Derived from public domain audiobooks featuring professional voice actors.
- Time Range
- null
- Freshness
- Last updated 2026-02-04 21:42:19; freshness should be verified.
- Geography
- null