Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
7 million diverse images sourced from datasets like COYO-700M and MS-COCO 2017, each paired with both a short and a detailed caption. This re-captioned dataset was created by DAMO-NLP-SG for training the VideoLLaMA 3 multimodal foundation model and was last updated in February 2025.
Users should review the full dataset page on Hugging Face for details on licensing, data structure, and access instructions before downloading.