VideoLLaMA 3 Training Images with Short and Detailed Captions

Name: VideoLLaMA 3 Training Images with Short and Detailed Captions
Creator: DAMO-NLP-SG
Published: 2025-02-07T11:06:02
Keywords: Foundation Models, Multimodal Training, Computer Vision, Image Captioning, Large Scale, Natural Language Processing, Multimodal

by DAMO-NLP-SGUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

7 million diverse images sourced from datasets like COYO-700M and MS-COCO 2017, each paired with both a short and a detailed caption. This re-captioned dataset was created by DAMO-NLP-SG for training the VideoLLaMA 3 multimodal foundation model and was last updated in February 2025.

Use Cases

Train a vision-language model to generate detailed captions from image features.
Fine-tune an image captioning model using the paired short and detailed caption fields.
Benchmark the quality of generated captions against the provided human-curated detailed captions.
Pre-train a model for video understanding by leveraging the foundational image-text pairs.

Strengths

Contains 7 million image-text pairs, providing a large-scale training resource.
Each image includes two caption variants (short and detailed), offering richer supervision signals.

Limitations

Specific image sources and licensing terms are not detailed in the provided description.
The dataset size, column names, and sample data are not specified, limiting initial assessment.

Provenance

Source: DAMO-NLP-SG.
Collection Method: Images re-captioned from existing datasets including COYO-700M and MS-COCO 2017.
Freshness: Last updated on 2025-02-07.

Users should review the full dataset page on Hugging Face for details on licensing, data structure, and access instructions before downloading.

Multimodal Foundation Models Multimodal Training Computer Vision Image Captioning Large Scale Natural Language Processing

Related Datasets

Quality Score

D36

Description

42

Source

36

Reputation

32

Access

26

Community

21 downloads

10 likes

0 views

Dataset Info

Author: DAMO-NLP-SG
Created: Feb 7, 2025
Updated: Feb 7, 2025
Last synced: Jun 6, 2026

Access

26

Community

21 downloads

10 likes

0 views

Dataset Info

Author: DAMO-NLP-SG
Created: Feb 7, 2025
Updated: Feb 7, 2025
Last synced: Jun 6, 2026

VideoLLaMA 3 Training Images with Short and Detailed Captions

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info