Name: Long Context Multimodal Document Understanding Benchmark
Creator: AmazonScience
Published: 2025-04-28T18:52:32
Keywords: Vision, Task Categoriesquestion Answering, Image, Languageen, Task Categoriesvisual Question Answering, Modalitytext, Pdf, Modalitydocument, Modalityimage, Task Categoriesdocument Question Answering, Benchmark, Text, Text, Large Language Model, Regionus, Long Context, Vlm, Arxiv250715882, Multimodal

Description

Document Haystack is a benchmark dataset for evaluating multimodal Large Language Models on long-context image and document understanding tasks. It was created by AmazonScience for a 2025 research paper to address the lack of suitable benchmarks for processing long documents. The specific row count, column count, and data size are not provided in the input.

Use Cases

Benchmark Vision Language Models on long-context multimodal understanding tasks using image and document data.
Evaluate model performance on complex document analysis as described in the associated research paper.
Test the ability of multimodal LLMs to process and understand extended sequences of text and image inputs.

Strengths

Created by AmazonScience, a research organization.
Specifically designed to address the under-explored area of long document processing in multimodal AI.
Last updated on August 4, 2025, indicating recent maintenance.

Limitations

The dataset description lacks concrete details like row count, column names, sample data, and file formats.
The scope and scale of the data are unknown, making it difficult to assess its size and comprehensiveness.
Access to the full description requires visiting an external link, limiting immediate transparency.

Provenance

Source: AmazonScience
Collection Method: Created for the research paper 'Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark'.
Freshness: Last updated on 2025-08-04.

The full dataset description is hosted externally at https://huggingface.co/datasets/AmazonScience/document-haystack. License information is unknown.

Image Text Multimodal Vision Task Categoriesquestion Answering Languageen Task Categoriesvisual Question Answering Modalitytext Pdf Modalitydocument Modalityimage Task Categoriesdocument Question Answering Benchmark Large Language Model Regionus Long Context Vlm Arxiv250715882

Long Context Multimodal Document Understanding Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info