550 annotated speech samples categorized across 11 distinct paralinguistic dimensions for speech-to-speech model evaluation. The dataset includes curated audio files and corresponding annotations derived from the Step-Audio 2 technical research.
Use Cases
- Evaluate model performance on non-verbal cues using the 11 paralinguistic dimensions
- Benchmark speech-to-speech generation models using the 550 annotated audio samples
- Analyze model sensitivity to paralinguistic variations using the curated speech files
Strengths
- 550 curated and annotated speech samples
- Covers 11 distinct paralinguistic dimensions for model evaluation
- Designed as a speech-to-speech benchmark for the Step-Audio 2 model series