Four meta-task categories including Screenshot Retrieval (SR), Composed Screenshot Retrieval (CSR), Screenshot QA (SQA), and Open-Vocabulary form the core of this Visualized Information Retrieval (Vis-IR) benchmark. The dataset utilizes digital screenshots to unify search and information extraction tasks across diverse application scenarios.
Use Cases
- Benchmark Screenshot Retrieval (SR) systems by matching queries to relevant visual screenshot targets.
- Develop Composed Screenshot Retrieval (CSR) models that process queries consisting of a base screenshot and a text-based modification.
- Train Screenshot QA (SQA) agents to extract data and answer queries based on the visual elements within a screenshot.
Strengths
- Includes four meta-task categories: Screenshot Retrieval (SR), Composed Screenshot Retrieval (CSR), Screenshot QA (SQA), and Open-Vocabulary.
- Focuses on Visualized Information Retrieval (Vis-IR) using digital screenshots as the core data format.
- Aggregates diverse application scenarios to evaluate search performance in varied visual contexts.