4 distinct subsets including MSCOCO and VisualNews provide multimodal queries and documents for cross-modal retrieval evaluation. The dataset utilizes queries.jsonl files to benchmark performance on text-only, image-only, and combined image-text search tasks.
Use Cases
- Evaluate cross-modal retrieval models using the queries.jsonl entries to match text queries to image documents
- Benchmark multimodal search systems by processing combined image+text inputs against a mixed-modality document corpus
- Analyze model performance variance across different domains like news (VisualNews) and general objects (MSCOCO)
Strengths
- Includes four specialized subsets: MSCOCO, Google_WIT, VisualNews, and OVEN
- Standardized queries.jsonl format across all subsets for consistent evaluation
- Supports three retrieval modes: pure text, pure images, and multimodal image+text inputs