Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
12 million unique identifiers (UIDs) reference a filtered subset of the larger DataComp-1B-BestPool dataset. Apple created this collection to train image-text models that outperform those trained on established benchmarks like CC-12M and YFCC-15M. The dataset card was last updated in February 2025.
This repository contains only unique identifiers (UIDs). The actual image and caption data must be retrieved from the source shards referenced by these UIDs, available at 'mlfoundations/DataComp-12M'.