VisCon-100K: Image-Text Conversations for Vision-Language Model Fine-Tuning

Name: VisCon-100K: Image-Text Conversations for Vision-Language Model Fine-Tuning
Creator: tiiuae
Published: 2025-02-14T12:47:15
Keywords: Vision Language Models, Computer Vision, Image Captioning, Fine Tuning, Web Documents, Multimodal

by tiiuaeUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

100,000 image conversation samples derived from 45,000 web documents in the OBELICS dataset. GPT-4V and OpenChat 3.5 were used to generate contextual captions and convert them into diverse free-form conversations. The dataset was authored by tiiuae and last updated on February 17, 2025.

Use Cases

Fine-tuning vision-language models based on interleaved image-text web documents.
Training models for image captioning based on GPT-4V generated contextual captions.
Developing conversational AI agents that can discuss images based on diverse free-form conversations.
Benchmarking model performance on tasks requiring contextual understanding of images and text.

Strengths

Contains 100,000 image conversation samples.
Derived from 45,000 web documents, suggesting a broad source base.
Leverages advanced models (GPT-4V, OpenChat 3.5) for data generation.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the original web documents.

Provenance

Source: tiiuae
Collection Method: Derived from the OBELICS dataset; GPT-4V and OpenChat 3.5 used for caption generation and conversation creation.
Time Range: null
Freshness: Last updated 2025-02-17 06:29:36; freshness should be verified.
Geography: null

null

Multimodal Vision Language Models Computer Vision Image Captioning Fine Tuning Web Documents

Related Datasets

Quality Score

D36

Description

42

Source

36

Reputation

31

Access

26

Community

141 downloads

2 likes

0 views

Dataset Info

Author: tiiuae
Created: Feb 14, 2025
Updated: Feb 17, 2025
Last synced: Apr 18, 2026

Access

26

Community

141 downloads

2 likes

0 views

Dataset Info

Author: tiiuae
Created: Feb 14, 2025
Updated: Feb 17, 2025
Last synced: Apr 18, 2026

VisCon-100K: Image-Text Conversations for Vision-Language Model Fine-Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info