Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VLM Eval Videos is a benchmark dataset containing 693 short MP4 video clips for evaluating Vision-Language Models. The dataset, created by author gnitoahc, is organized into five categories, with each clip paired with a fixed question and a ground-truth short-sentence answer. It was last updated on the Hugging Face platform in June 2026.
License is unknown; terms of use must be verified before application.