DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

OmniVideoBench: Audio-Visual Understanding Evaluation for Multimodal LLMs | DataSalon

Home Computer VisionOmniVideoBench: Audio-Visual Understanding Evaluation for Multimodal LLMs

Computer Vision

OmniVideoBench: Audio-Visual Understanding Evaluation for Multimodal LLMs

Name: OmniVideoBench: Audio-Visual Understanding Evaluation for Multimodal LLMs
Creator: NJU-LINK
Published: 2025-10-15T03:08:43
Keywords: Benchmark Evaluation, Audio Visual Reasoning, Multimodal Llm, Benchmark, Video Understanding, Audio, Large Scale, Video, Multimodal

by NJU-LINK·Updated 2mo ago

Available on 1 platform

Description

NJU-LINK's OmniVideoBench is a large-scale benchmark dataset designed to evaluate multimodal large language models on joint audio and visual reasoning tasks. It addresses a gap in existing benchmarks that often focus on a single modality. The dataset was last updated on April 8, 2026.

Use Cases

Benchmarking model performance on joint audio-visual reasoning tasks based on the described evaluation focus.
Training multimodal LLMs for video understanding based on the dataset's large-scale, curated nature.
Studying the interaction between audio and visual modalities in AI systems based on the dataset's stated purpose.

Strengths

Designed as a large-scale benchmark, suggesting substantial volume.
Rigorously curated, implying a level of quality control.
Explicitly targets joint audio and visual reasoning, filling a described gap in evaluation.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and file size are unknown, which may limit suitability assessment.
The description is incomplete, requiring a visit to the external page for full details.

Provenance

Source: NJU-LINK
Freshness: Last updated 2026-04-08 05:19:52.

The full description is hosted externally; users must visit the Hugging Face dataset page for complete details.

Audio Video Multimodal Benchmark Evaluation Audio Visual Reasoning Multimodal Llm Benchmark Video Understanding Large Scale

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

1.9K downloads

5 likes

0 views

Dataset Info

Author: NJU-LINK
Created: Oct 15, 2025
Updated: Apr 8, 2026
Last synced: May 20, 2026

Access

Community

1.9K downloads

5 likes

0 views

Dataset Info

Author: NJU-LINK
Created: Oct 15, 2025
Updated: Apr 8, 2026
Last synced: May 20, 2026

OmniVideoBench: Audio-Visual Understanding Evaluation for Multimodal LLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info