Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A collection of 27 million images, each paired with a long caption generated by the Qwen2.5-VL-7B-Instruct model. The dataset was created by the BLIP3o organization and published on Hugging Face in June 2025. It is intended for pretraining vision-language models.
The dataset is stored in .tar archives and is designed to be used with WebDataset support in the 🤗datasets library without unpacking.