Librispeech Childrenization: Speech Audio Samples for Child Voice Modeling
Available on 1 platform
Sign in to view source links and access this dataset
Description
A speech audio dataset derived from the LibriSpeech corpus, likely containing processed or synthesized samples to model children's speech characteristics. The dataset title suggests a scale of 10,000 to 15,000 audio samples. It is hosted on Kaggle, but the original author, collection method, and specific time range are unknown.
Use Cases
Training or fine-tuning automatic speech recognition (ASR) models for child voices (inferred from domain, verify after download)
Developing voice conversion or text-to-speech systems targeting younger demographics (inferred from domain, verify after download)
Benchmarking speech model performance on age-specific acoustic features (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science resources.
Derived from the well-known LibriSpeech corpus, suggesting a foundation in established speech data.
Limitations
Metadata is minimal; actual content requires verification after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
Derived from the LibriSpeech corpus.
Collection Method
Unknown; the 'childrenization' process is not described.
Time Range
Unknown
Freshness
Last update date is unknown; freshness unverified.
Geography
Unknown
License is unknown; verify terms on the Kaggle source page before use.