Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MMAU provides between 1,000 and 10,000 test records for evaluating audio large language models, released by TwinkStart in early 2026. It is integrated into the UltraEval-Audio framework to benchmark performance across 12 task types and 10 languages. The data spans four specialized domains: speech, general sound, medical audio, and music.
Designed specifically for use with the UltraEval-Audio framework; users should refer to the associated GitHub repository for execution scripts and model integration instructions.