Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A subset of approximately 15 million image-text pairs from the YFCC100M dataset, curated for training vision-language models. It was prepared by author vishaal27 and uploaded to Hugging Face in January 2024. The dataset provides page URLs and direct image download URLs for each entry.
Images are not included in the repository; users must download them using the provided URLs and a tool like img2dataset. The license for the underlying YFCC100M data and this specific subset is not stated.