Name: ViP-Bench: A Benchmark for Evaluating Region-Level Multimodal Model Understanding
Creator: mucai
Published: 2023-12-02T05:54:04
Keywords: Benchmark, Computer Vision, Visual Question Answering, Multimodal Evaluation, Multimodal

Description

ViP-Bench is a region-level multimodal model evaluation benchmark curated by the University of Wisconsin-Madison. It provides two kinds of visual prompts for testing model understanding: bounding boxes and human-drawn diverse visual prompts. The dataset was last updated on December 15, 2023.

Use Cases

Benchmarking model performance on region-level visual question answering based on the described visual prompts.
Evaluating the robustness of multimodal models to diverse, human-drawn visual prompts as described in the dataset.
Comparing model accuracy on tasks defined by bounding box prompts versus free-form visual prompts.
Developing new evaluation metrics for spatial reasoning in multimodal AI systems.

Strengths

Provides two distinct types of visual prompts for evaluation: bounding boxes and human-drawn prompts.
Curated by an academic institution, the University of Wisconsin-Madison.
Has an associated public leaderboard for tracking model performance.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and file formats are unknown, which may limit suitability assessment.

Provenance

Source: University of Wisconsin-Madison
Collection Method: Curated benchmark, likely involving human annotation for visual prompts.
Time Range: null
Freshness: Last updated 2023-12-15 01:08:04; freshness should be verified.
Geography: null

null

Multimodal Benchmark Computer Vision Visual Question Answering Multimodal Evaluation

ViP-Bench: A Benchmark for Evaluating Region-Level Multimodal Model Understanding

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info