Sign in to view source links and access this dataset
Description
ESC-50 contains 2,000 environmental audio recordings organized into 50 semantic categories, created by Karol Piczak in 2015. Each recording is a 5-second clip extracted from the Freesound project, covering animal sounds, natural soundscapes, and domestic noises.
Use Cases
Training multiclass classifiers to distinguish between 50 environmental sound categories
Benchmarking audio feature extraction methods on standardized 5-second clips
Evaluating model performance using the author-provided 5-fold cross-validation structure
Strengths
50 distinct semantic categories
Balanced distribution with exactly 40 samples per class
Standardized 5-second clip lengths for uniform processing
Limitations
Small sample size of 2,000 records compared to large-scale audio datasets
Fixed 5-second duration may truncate longer environmental sound events
Restricted to 50 specific classes which may not cover all real-world acoustic environments
Provenance
Source
Karol J. Piczak
Collection Method
Manual selection and annotation of recordings from Freesound.org
Time Range
2015
Freshness
The core dataset was released in 2015; the GitHub repository was updated in March 2024.
Users typically utilize the pre-defined 5-fold cross-validation splits to ensure results are comparable with existing literature.