Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A benchmark for evaluating multimodal embedding models, covering 4 meta tasks and 36 datasets. The dataset was created by TIGER-Lab and published in the paper 'VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks'. It was last updated on Hugging Face on October 28, 2024.
License is unknown; users should verify the license on the dataset page before use.