Name: DataComp-12M: A 12 Million Image-Text Pair Subset for Multimodal Model Training
Creator: mlfoundations
Published: 2024-06-26T21:32:54
Keywords: Image Text Pairs, Multimodal Training, Computer Vision, Clip Models, Multimodal

Description

A subset of 12 million image-text pairs from the DataComp-1B-BestPool collection, released by mlfoundations in 2024. The dataset is designed for training image-text models and is licensed under Creative Commons CC-BY-4.0, though individual images retain their original copyrights. It was introduced in the MobileCLIP paper and is reported to yield better model performance than several established benchmarks.

Use Cases

Training image-text retrieval models based on the 12 million image-url-text samples.
Benchmarking multimodal model performance against established datasets like CC-12M and YFCC-15M.
Fine-tuning CLIP-based models for specific downstream tasks using the provided image-text pairs.
Researching dataset scaling effects and model performance using the curated subset from a larger pool.

Strengths

Contains 12 million image-text pairs, a substantial scale for training.
Reported to yield significantly better model performance than CC-12M and YFCC-15M benchmarks.
Derived from the curated DataComp-1B-BestPool, suggesting a level of quality filtering.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: mlfoundations via Hugging Face Datasets.
Collection Method: Subset of the DataComp-1B-BestPool collection.
Freshness: Last updated 2024-06-26 22:58:01; freshness should be verified.

Images are under their own original copyrights, requiring separate verification for commercial use, while the URL-text metadata is under CC-BY-4.0.

Multimodal Image Text Pairs Multimodal Training Computer Vision Clip Models

DataComp-12M: A 12 Million Image-Text Pair Subset for Multimodal Model Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info