Sign in to view source links and access this dataset
Description
2.6 million audio snippets totaling 4,932 hours of speech, enhanced with emotion annotations and speaker embeddings. The dataset, created by ai-music4you3, contains WAV files at 48kHz mono with durations ranging from 3.0 seconds to over 18 minutes. It was last updated on March 17, 2026.
Use Cases
Train speech enhancement models based on the described processing pipeline.
Develop emotion recognition systems based on the emotion annotations.
Build speaker verification or identification models based on the speaker embeddings.
Analyze speech patterns and metadata based on the comprehensive metadata analysis mentioned.
Strengths
Contains 2,633,037 audio samples, providing a large-scale resource.
Offers 4,932 hours of total audio content for training.
Includes processed features like emotion annotations and speaker embeddings.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect temporal or source bias inherent to the original source dataset.
Provenance
Source
huggingface
Collection Method
Enhanced version of mitermix/audiosnippets_long_2_8M with additional processing.
Freshness
Last updated 2026-03-17 13:44:57
License is unknown; terms of use must be verified before application.