Name: MMEB Train: Vision-Language Model Training Data for 20 Multimodal Tasks
Creator: TIGER-Lab
Published: 2024-10-08T04:05:01
Keywords: Librarypolars, Size Categories1 Mn10 M, Languageen, Modalitytext, Librarymlcroissant, Modalityimage, Librarydatasets, Embedding, Librarypandas, Parquet, Arxiv241005160, Regionus, Licenseapache 20

Description

This is the training split for the Massive Multimodal Embedding Benchmark (MMEB), used to train VLM2Vec models as described in an ICLR 2025 paper. It comprises data from 20 out of 36 datasets selected for evaluating multimodal embedding models across 4 meta tasks.

Use Cases

Train vision-language models like VLM2Vec on the 20 in-domain datasets for multimodal embedding tasks.
Benchmark model performance on the 4 meta tasks defined by the MMEB framework.
Analyze the transferability of embeddings learned from the 20 training datasets to the 16 held-out evaluation datasets.

Strengths

Part of a benchmark covering 36 datasets for evaluating multimodal embedding models.
Specifically designed for training models documented in a peer-reviewed ICLR 2025 publication.
Covers 4 distinct meta tasks for comprehensive capability assessment.

Limitations

Specific dataset size, row count, column structure, and file formats are not provided.
The composition and balance of the 20 constituent datasets are unknown, which may affect training dynamics.
Limited information on data provenance, collection methods, and potential biases within the source datasets.

Provenance

Source: TIGER-Lab, via Hugging Face.
Collection Method: Curated from 20 datasets selected for the Massive Multimodal Embedding Benchmark.
Freshness: Last updated on 2025-01-28.

The full dataset description, including specific data details and license, is hosted externally at https://huggingface.co/datasets/TIGER-Lab/MMEB-train.

Parquet Librarypolars Size Categories1 Mn10 M Languageen Modalitytext Librarymlcroissant Modalityimage Librarydatasets Embedding Librarypandas Arxiv241005160 Regionus Licenseapache 20

MMEB Train: Vision-Language Model Training Data for 20 Multimodal Tasks

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info