Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
3,763 web-collected videos with subtitles and multiple-choice questions comprise this long-context multimodal benchmark. Created for NeurIPS 2024, it evaluates large multimodal models on video-language interleaved inputs with durations reaching up to one hour.
Licensed under CC BY-NC-SA 4.0; requires handling of large-scale video files and parquet-formatted metadata.