VocalSet is a singing voice dataset containing 10.1 hours of monophonic audio recordings. It features 20 professional singers (9 male, 11 female) performing standard and extended vocal techniques on five vowels. The dataset was created by Julia Wilkins to support singing voice research.
Use Cases
- Train singing voice synthesis models based on diverse vocal techniques and vowels.
- Analyze acoustic properties of different voice types based on recordings from 20 singers.
- Develop voice conversion systems based on a range of sung scales, arpeggios, and long tones.
- Benchmark speech recognition models on extended vocal techniques mentioned in the description.
Strengths
- Contains 10.1 hours of recorded audio.
- Includes 20 professional singers (9 male, 11 female).
- Covers five vowels and a diverse set of vocal techniques.
- Provides predefined train/test splits for model evaluation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Last update date is unknown; freshness unverified.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- Julia Wilkins via paperswithcode
- Collection Method
- Monophonic audio recordings of professional singers.