Name: Visual Haystacks: A Benchmark for Long-Context Multimodal Model Evaluation
Creator: tsunghanwu
Published: 2024-07-09T18:16:18
Keywords: Model Evaluation, Vision Language, Benchmark, Computer Vision, Multimodal Benchmark, Multimodal

Description

Visual Haystacks (VHs) is a benchmark dataset designed to evaluate Large Multimodal Models' capability to handle long-context visual information. It is described as the first vision-centric Needle-In-A-Haystack benchmark. The dataset was created by tsunghanwu and was last updated on Hugging Face on October 16, 2024.

Use Cases

Benchmarking long-context visual information retrieval in multimodal models based on the described Needle-In-A-Haystack task.
Evaluating vision-language model performance on tasks requiring the identification of specific details within extensive visual data.
Assessing the robustness of multimodal architectures against information overload in visual inputs.

Strengths

Dataset is specifically designed as a benchmark for a defined research problem: evaluating long-context visual information handling.
It is described as the first vision-centric Needle-In-A-Haystack benchmark, suggesting a novel contribution to the field.
The dataset page was updated on 2024-10-16, indicating recent maintenance.

Limitations

Description metadata is limited; actual data quality, size, and structure require manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and file size are unknown, which may limit suitability assessment for large-scale training.

Provenance

Source: tsunghanwu on Hugging Face.
Collection Method: Likely constructed for benchmarking purposes, potentially using or relating to the COCO-2017 dataset as referenced in the description.
Freshness: Last updated 2024-10-16 21:26:07.

The description notes that users should also download the COCO-2017 training and validation sets, indicating a dependency on external data.

Multimodal Model Evaluation Vision Language Benchmark Computer Vision Multimodal Benchmark

Visual Haystacks: A Benchmark for Long-Context Multimodal Model Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info