Name: VisCon-100K: 100K Image-Conversation Samples for Vision-Language Model Fine-Tuning
Creator: tiiuae
Published: 2025-02-14T10:54:52
Keywords: Vision Language Models, Computer Vision, Image Captioning, Fine Tuning, Web Documents, Multimodal

Description

VisCon-100K is a dataset of 100,000 image-conversation samples designed for fine-tuning vision-language models. It is derived from 45,000 web documents in the OBELICS dataset, with captions generated by GPT-4V and converted into free-form conversations by OpenChat 3.5. The dataset was created by tiiuae and last updated on February 17, 2025.

Use Cases

Fine-tuning vision-language models based on interleaved image-text web documents.
Training models for contextual image captioning based on GPT-4V generated descriptions.
Developing conversational AI agents that can discuss images based on free-form conversation data.
Benchmarking model performance on tasks requiring integration of visual and textual web data.

Strengths

Contains 100,000 image-conversation samples, providing a substantial volume for training.
Derived from 45,000 web documents, suggesting a diverse source of contextual data.
Leverages GPT-4V for caption generation, which may indicate high-quality initial annotations.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but the specific distribution of images per document and other structural details are unknown.
Data may reflect the biases inherent to the source web documents and the AI models used for annotation.

Provenance

Source: huggingface, author tiiuae
Collection Method: Derived from the OBELICS dataset's web documents, with annotations generated by GPT-4V and OpenChat 3.5.
Time Range: null
Freshness: Last updated 2025-02-17 06:29:32.
Geography: null

null

Multimodal Vision Language Models Computer Vision Image Captioning Fine Tuning Web Documents

VisCon-100K: 100K Image-Conversation Samples for Vision-Language Model Fine-Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info