DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Conceptual Captions 12 Million Image-Text Pairs | DataSalon

Home Multimodal & LLMConceptual Captions 12 Million Image-Text Pairs

Multimodal & LLM

Conceptual Captions 12 Million Image-Text Pairs

Name: Conceptual Captions 12 Million Image-Text Pairs
Creator: pixparse
Published: 2023-12-12T23:59:59
Keywords: Image Text Pairs, Vision Language, Multimodal Training, Computer Vision, Large Scale, Pre Training Data, Multimodal

by pixparse·Updated 2y ago

Available on 1 platform

Description

Conceptual Captions 12M (CC12M) contains 12 million image-text pairs designed for vision-and-language pre-training. It was created by pixparse and is a relaxed version of the CC3M dataset pipeline. The dataset instance was last updated on Hugging Face in December 2023.

Use Cases

Train image captioning models using the paired image and text fields.
Perform vision-language pre-training for tasks like visual question answering using the multimodal pairs.
Benchmark model performance on large-scale image-text alignment using the 12 million pairs.
Fine-tune models for zero-shot image classification using the descriptive text labels.

Strengths

Contains 12 million image-text pairs.
Built as a relaxed version of the established CC3M pipeline.

Limitations

Specific column names, file formats, and data size are unknown.
The temporal and geographic coverage of the images and text are unknown.

Provenance

Source: pixparse on Hugging Face.
Collection Method: Data collection pipeline is a relaxed version of the one used for Conceptual Captions 3M (CC3M).
Time Range: null
Freshness: null
Geography: null

This instance is provided in webdataset .tar format, requiring the webdataset library or specific Hugging Face datasets releases for use.

Multimodal Image Text Pairs Vision Language Multimodal Training Computer Vision Large Scale Pre Training Data

Related Datasets

Quality Score

D40

Description

Source

Reputation

Quality Score

D40

Description

Source

Reputation

Access

Community

27.7K downloads

39 likes

0 views

Dataset Info

Author: pixparse
Created: Dec 12, 2023
Updated: Dec 15, 2023
Last synced: Jul 13, 2026

Access

Community

27.7K downloads

39 likes

0 views

Dataset Info

Author: pixparse
Created: Dec 12, 2023
Updated: Dec 15, 2023
Last synced: Jul 13, 2026

Conceptual Captions 12 Million Image-Text Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info