LLaVA CC3M Pretrain 595K: A Subset for Visual Instruction Tuning

Name: LLaVA CC3M Pretrain 595K: A Subset for Visual Instruction Tuning
Creator: liuhaotian
Published: 2023-04-20T14:28:12
Keywords: Vision Language, Multimodal Ai, Computer Vision, Image Captioning, Synthetic, Multimodal, Pretraining Data

by liuhaotianUpdated 3y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

595,000 image-text pairs form a subset of the CC-3M dataset, filtered for balanced concept coverage. It was created by liuhaotian for the pretraining stage of visual instruction tuning, aiming to build large multimodal models. The dataset was last updated on July 6, 2023.

Use Cases

Pretraining multimodal models for feature alignment based on image-text pairs.
Training models for visual instruction tuning based on the described concept-balanced subset.
Benchmarking vision-language model performance on a filtered version of CC-3M data.
Generating synthetic captions for images using the referenced BLIP captioning method.

Strengths

Contains 595,000 data points, providing a substantial base for model training.
Filtered from CC-3M for more balanced concept coverage, which may reduce bias.
Includes BLIP-generated synthetic captions, offering an additional reference source.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but other scale metrics like file size and format are unknown.
Last updated 2023-07-06 08:51:35; freshness should be verified for current research.

Provenance

Source: Subset of the CC-3M dataset.
Collection Method: Filtered for balanced concept coverage; captions associated with BLIP synthetic captions.
Freshness: 2023-07-06

License is unknown, which may restrict usage.

Multimodal Vision Language Multimodal Ai Computer Vision Image Captioning Synthetic Pretraining Data

Related Datasets

Quality Score

D38

Description

42

Source

41

Reputation

33

Access

26

Community

675 downloads

177 likes

0 views

Dataset Info

Author: liuhaotian
Created: Apr 20, 2023
Updated: Jul 6, 2023
Last synced: Jul 26, 2026

Access

26

Community

675 downloads

177 likes

0 views

Dataset Info

Author: liuhaotian
Created: Apr 20, 2023
Updated: Jul 6, 2023
Last synced: Jul 26, 2026

LLaVA CC3M Pretrain 595K: A Subset for Visual Instruction Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info