Offering 2 specific test subsets from the WenetSpeech corpus, 'test-meeting' and 'test-net', integrated for use within the UltraEval-Audio framework. It facilitates the evaluation of large audio models across 12 task categories including speech understanding and generation using standardized benchmarking scripts.
Use Cases
- Benchmark the word error rate of large audio models using the --dataset WenetSpeech-test-meeting flag
- Evaluate speech-to-text accuracy on noisy internet audio using the WenetSpeech-test-net configuration
- Compare performance across different audio models using the standardized UltraEval-Audio evaluation pipeline
Strengths
- Includes the WenetSpeech-test-meeting subset for evaluating speech recognition in multi-speaker meeting environments
- Includes the WenetSpeech-test-net subset for testing model performance on diverse internet-sourced audio data
- Compatible with the audio_evals/main.py script for automated evaluation of models like gpt4o_audio