Sign in to view source links and access this dataset
Description
A 15GB variant of the LibriVAD dataset, which is built on the LibriSpeech corpus. The dataset is noise-augmented, suggesting it is designed for training models in noisy acoustic environments. Its author, organization, and specific creation date are unknown.
Use Cases
Train voice activity detection models based on the noise-augmented speech data.
Benchmark speech recognition systems in noisy conditions based on the described augmentation.
Develop audio denoising or enhancement algorithms using the augmented LibriSpeech samples.
Strengths
15GB size provides a substantial volume of audio data for model training.
Built on the established LibriSpeech corpus, which is a known benchmark in speech research.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last update date is unknown; freshness unverified.
Provenance
Source
Kaggle
Collection Method
Derived and augmented from the LibriSpeech corpus.
License is unknown; users must verify terms before commercial use.