Name: VSTAT: Visual State Tracking Benchmark for MLLMs
Creator: nyu-visionx
Published: 2026-05-29T01:43:20
Keywords: Mllm Evaluation, Multimodal Ai, Benchmark, Video Benchmark, Video, Synthetic, Multimodal

Description

VSTAT is a video-based benchmark for evaluating the visual state tracking capability of Multimodal Large Language Models (MLLMs). It contains 834 video clips paired with 1,500 questions whose answers cannot be inferred from any single keyframe or short segment. The dataset was created by nyu-visionx and was last updated in June 2026.

Use Cases

Benchmarking MLLM performance on visual state tracking based on the 1,500 video-question pairs.
Training models to reason about temporal changes in video based on questions requiring multi-segment analysis.
Analyzing the difficulty of synthetic versus real-world video data based on the split composition (synthetic, self-recorded, YouTube).

Strengths

Contains 834 video clips, providing a substantial testbed for video-based models.
Includes 1,500 questions specifically designed to require temporal reasoning across video segments.
Comprises a mix of 450 synthetic, 80 self-recorded, and 304 YouTube videos, offering varied data sources.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for individual splits or the total number of data points beyond video/question counts is unknown.
Description metadata is limited; actual data quality and file formats require manual inspection after download.

Provenance

Source: nyu-visionx on Hugging Face
Collection Method: Likely contains a mix of synthetically generated, self-recorded, and sourced YouTube video clips.
Freshness: Last updated 2026-06-03 06:30:42; freshness should be verified.

License is unknown; users must verify terms of use before downloading.

Video Multimodal Mllm Evaluation Multimodal Ai Benchmark Video Benchmark Synthetic

VSTAT: Visual State Tracking Benchmark for MLLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info