Sign in to view source links and access this dataset
Description
16,130 images and 1,353 human-annotated multiple-choice questions across 9 distinct scenarios form this benchmark for evaluating vision-centric multimodal retrieval-augmented generation (RAG) abilities in Large Vision Language Models (LVLMs). The dataset, named MRAG-Bench, was created by uclanlp and last updated on November 5, 2024. It provides a systematic evaluation framework for both open-source and proprietary models.
Use Cases
Benchmarking the vision-centric RAG performance of Large Vision Language Models based on the 9 distinct scenarios.
Evaluating model retrieval and generation accuracy using the 1,353 human-annotated multiple-choice questions.
Comparing the capabilities of different multimodal architectures, as the description mentions evaluation of 10 open-source and 4 proprietary models.
Strengths
Contains 16,130 images, providing a substantial visual corpus for evaluation.
Includes 1,353 human-annotated multiple-choice questions, offering a structured test set.
Covers 9 distinct scenarios, suggesting a systematic approach to evaluating different RAG abilities.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for specific training or validation splits.
Description metadata is limited; actual data quality and format require manual inspection after download.
Provenance
Source
uclanlp via Hugging Face
Collection Method
Human-annotated, as indicated by the description of multiple-choice questions.
Freshness
Last updated 2024-11-05 18:44:48; freshness should be verified.
License is unknown; users should verify terms of use before downloading.