Featuring segmented deepfake speech audio clips aggregated from 4 public source datasets. The audio is partitioned into 2-second clips with a 1-second overlap to provide consistent input lengths for acoustic feature extraction and temporal analysis.
Use Cases
- Train a deepfake detection model using the 2-second audio clips as input features
- Benchmark temporal consistency in synthetic speech detection using the 1-second overlapping segments
- Develop cross-dataset generalization tests by training on segments from one source and testing on others
Strengths
- Aggregates speech data from 4 distinct public deepfake datasets
- Standardized audio segmentation into 2-second clips
- Includes a 1-second overlap between consecutive audio segments
- Optimized for Wav2Vec-based feature extraction architectures