Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Conceptual Captions 12M (CC12M) contains 12 million image-text pairs designed for vision-and-language pre-training. It was created by pixparse and is a relaxed version of the CC3M dataset pipeline. The dataset instance was last updated on Hugging Face in December 2023.
This instance is provided in webdataset .tar format, requiring the webdataset library or specific Hugging Face datasets releases for use.