This dataset aggregates audio samples from 4 public speech sources, processed into 2-second segments with a 1-second overlap. The collection focuses on deepfake voice detection through the application of MFCC (Mel-frequency cepstral coefficients) features extracted from the segmented clips.
Use Cases
- Train a binary classifier to distinguish between real and deepfake speech using MFCC feature vectors
- Analyze the effect of 1-second segment overlap on the temporal consistency of synthetic voice detection
- Benchmark detection algorithms across the 4 source datasets to evaluate model generalization across different recording environments
Strengths
- Aggregates audio data from 4 distinct public speech datasets
- Standardized 2-second clip duration for all audio segments
- Includes a 1-second overlap between consecutive segments
- Specifically formatted for MFCC (Mel-frequency cepstral coefficients) feature extraction