100,000 image conversation samples derived from 45,000 web documents in the OBELICS dataset. GPT-4V and OpenChat 3.5 were used to generate contextual captions and convert them into diverse free-form conversations. The dataset was authored by tiiuae and last updated on February 17, 2025.
Use Cases
- Fine-tuning vision-language models based on interleaved image-text web documents.
- Training models for image captioning based on GPT-4V generated contextual captions.
- Developing conversational AI agents that can discuss images based on diverse free-form conversations.
- Benchmarking model performance on tasks requiring contextual understanding of images and text.
Strengths
- Contains 100,000 image conversation samples.
- Derived from 45,000 web documents, suggesting a broad source base.
- Leverages advanced models (GPT-4V, OpenChat 3.5) for data generation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Data may reflect source bias inherent to the original web documents.
Provenance
- Source
- tiiuae
- Collection Method
- Derived from the OBELICS dataset; GPT-4V and OpenChat 3.5 used for caption generation and conversation creation.
- Time Range
- null
- Freshness
- Last updated 2025-02-17 06:29:36; freshness should be verified.
- Geography
- null