HLVid: High-Resolution Long-Form Video QA Benchmark

Name: HLVid: High-Resolution Long-Form Video QA Benchmark
Creator: bfshi
Published: 2026-02-14T01:53:52
Keywords: Long Form Video, Librarypolars, High Resolution, Size Categoriesn1 K, Modalitytext, Arxiv260312254, Librarymlcroissant, Multimodal Ai, Task Categoriesvideo Text To Text, Librarydatasets, Benchmark, Librarypandas, Parquet, Modalityvideo, Regionus, Video, Video Qa, Multimodal

by bfshiUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

HLVid is a benchmark for evaluating Multi-modal Large Language Models on long-form, high-resolution video understanding. It was introduced by author bfshi in the paper "Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing". The dataset features 5-minute videos at 4K resolution, challenging models to handle significant spatiotemporal redundancy.

Use Cases

Benchmarking video understanding models based on 5-minute long-form videos
Evaluating model efficiency on high-resolution 4K video data
Testing multimodal reasoning on video-text question answering tasks
Researching methods to handle spatiotemporal redundancy in video data

Strengths

Features 5-minute long-form videos, providing a challenging temporal scale
Uses high-resolution 4K video, offering detailed visual data
Specifically designed to benchmark Multi-modal Large Language Models (MLLMs)

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: bfshi via Hugging Face
Collection Method: Introduced as a benchmark in the associated research paper.
Freshness: Last updated 2026-03-19 20:39:55; freshness should be verified

Video Multimodal Parquet Long Form Video Librarypolars High Resolution Size Categoriesn1 K Modalitytext Arxiv260312254 Librarymlcroissant Multimodal Ai Task Categoriesvideo Text To Text Librarydatasets Benchmark Librarypandas Modalityvideo Regionus Video Qa

Related Datasets

Quality Score

D36

Description

36

Source

36

Reputation

43

Access

22

Community

167 downloads

1 likes

0 views

Dataset Info

Author: bfshi
Created: Feb 14, 2026
Updated: Mar 19, 2026

Access

22

Community

167 downloads

1 likes

0 views

Dataset Info

Author: bfshi
Created: Feb 14, 2026
Updated: Mar 19, 2026

HLVid: High-Resolution Long-Form Video QA Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info