Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A subset of 12 million image-text pairs from the DataComp-1B-BestPool collection, released by mlfoundations in 2024. The dataset is designed for training image-text models and is licensed under Creative Commons CC-BY-4.0, though individual images retain their original copyrights. It was introduced in the MobileCLIP paper and is reported to yield better model performance than several established benchmarks.
Images are under their own original copyrights, requiring separate verification for commercial use, while the URL-text metadata is under CC-BY-4.0.