DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

VLM4D: Spatiotemporal Reasoning Benchmark with 1,000 Video Samples | DataSalon

Home Multimodal & LLMVLM4D: Spatiotemporal Reasoning Benchmark with 1,000 Video Samples

Multimodal & LLM

VLM4D: Spatiotemporal Reasoning Benchmark with 1,000 Video Samples

Name: VLM4D: Spatiotemporal Reasoning Benchmark with 1,000 Video Samples
Creator: shijiezhou
Published: 2025-03-04T11:26:30
Keywords: Size Categoriesn1 K, Librarymlcroissant, Arxiv250802095, Task Categoriesvideo Text To Text, Librarydatasets, Modalityvideo, Regionus, Licensemit

by shijiezhou·Updated 5mo ago

Available on 1 platform

Description

VLM4D is a benchmark of approximately 1,000 real-world and synthetic videos designed to evaluate spatiotemporal reasoning in Vision Language Models. Developed by Shijie Zhou and researchers at UCLA in 2025, the dataset provides curated video-text pairs to test model awareness of motion and time.

Use Cases

Evaluating temporal sequencing in VLMs using video-text-to-text tasks
Testing spatial awareness by querying object trajectories across video frames
Benchmarking model reasoning on synthetic vs. real-world motion patterns

Strengths

First benchmark specifically for VLM spatiotemporal reasoning
Includes ~1,000 curated video samples
MIT licensed for open research

Limitations

Small sample size of ~1,000 records
Potential domain gap between synthetic and real-world video logic

Provenance

Source: UCLA (Shijie Zhou et al.), Arxiv 2508.02095
Collection Method: Curated from real-world sources and synthetic generation
Freshness: Last updated February 2026.
Geography: United States

Requires the VLM4D GitHub repository for standard evaluation scripts; released under the MIT license.

Size Categoriesn1 K Librarymlcroissant Arxiv250802095 Task Categoriesvideo Text To Text Librarydatasets Modalityvideo Regionus Licensemit

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

3.7K downloads

4 likes

0 views

Dataset Info

Author: shijiezhou
Created: Mar 4, 2025
Updated: Feb 22, 2026
Last synced: Apr 29, 2026

Access

Community

3.7K downloads

4 likes

0 views

Dataset Info

Author: shijiezhou
Created: Mar 4, 2025
Updated: Feb 22, 2026
Last synced: Apr 29, 2026

VLM4D: Spatiotemporal Reasoning Benchmark with 1,000 Video Samples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info