Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MMOU is a benchmark for evaluating multimodal models on joint audio-visual understanding and reasoning in long and complex real-world videos. The dataset was created by NVIDIA and last updated on March 28, 2026. It is designed to test models on video, speech, sound, music, and long-range temporal context.
License is unknown; users must verify terms before use.