MMOU: Benchmark for Multimodal Reasoning on Long, Complex Videos

Name: MMOU: Benchmark for Multimodal Reasoning on Long, Complex Videos
Creator: nvidia
Published: 2026-03-07T14:44:23
Keywords: Source Datasetsoriginal, Size Categories10 Kn100 K, Librarypolars, Arxiv260314145, Languageen, Ai Evaluation, Long Video, Audio Visual Reasoning, Modalitytext, Librarymlcroissant, Audio Visual, Task Categoriesvideo Text To Text, Librarydatasets, Benchmark, Librarypandas, Modalityvideo, Video Understanding, Audio, Regionus, Large Scale, Time Series, Video, JSON, Licenseapache 20, Multimodal Benchmark, Annotations Creatorsexpert Generated, Multimodal

by nvidiaUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

MMOU is a benchmark for evaluating multimodal models on joint audio-visual understanding and reasoning in long and complex real-world videos. The dataset was created by NVIDIA and last updated on March 28, 2026. It is designed to test models on video, speech, sound, music, and long-range temporal context.

Use Cases

Benchmarking multimodal AI models based on joint audio-visual reasoning tasks.
Evaluating model performance on long-range temporal context in videos.
Training models for complex scene understanding based on video, speech, and sound data.

Strengths

Focuses on long and complex real-world videos, a challenging domain for AI.
Created by NVIDIA, a leading organization in AI research and hardware.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Row count, file formats, and column-level documentation are absent.
License information is unknown, which may restrict usage.

Provenance

Source: NVIDIA
Freshness: Last updated 2026-03-28 00:17:41; freshness should be verified.

License is unknown; users must verify terms before use.

Related Datasets

Quality Score

D38

Description

39

Source

36

Reputation

53

Access

22

Community

1.7K downloads

15 likes

0 views

Dataset Info

Author: nvidia
Created: Mar 7, 2026
Updated: Mar 28, 2026
Last synced: Jun 20, 2026

Access

22

Community

1.7K downloads

15 likes

0 views

Dataset Info

Author: nvidia
Created: Mar 7, 2026
Updated: Mar 28, 2026
Last synced: Jun 20, 2026

MMOU: Benchmark for Multimodal Reasoning on Long, Complex Videos

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info