ViP-Bench is a region-level multimodal model evaluation benchmark curated by the University of Wisconsin-Madison. It provides two kinds of visual prompts for testing model understanding: bounding boxes and human-drawn diverse visual prompts. The dataset was last updated on December 15, 2023.
Use Cases
- Benchmarking model performance on region-level visual question answering based on the described visual prompts.
- Evaluating the robustness of multimodal models to diverse, human-drawn visual prompts as described in the dataset.
- Comparing model accuracy on tasks defined by bounding box prompts versus free-form visual prompts.
- Developing new evaluation metrics for spatial reasoning in multimodal AI systems.
Strengths
- Provides two distinct types of visual prompts for evaluation: bounding boxes and human-drawn prompts.
- Curated by an academic institution, the University of Wisconsin-Madison.
- Has an associated public leaderboard for tracking model performance.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and file formats are unknown, which may limit suitability assessment.
Provenance
- Source
- University of Wisconsin-Madison
- Collection Method
- Curated benchmark, likely involving human annotation for visual prompts.
- Time Range
- null
- Freshness
- Last updated 2023-12-15 01:08:04; freshness should be verified.
- Geography
- null