150,000 GPT-generated multimodal instruction-following data points collected in April 2023. The dataset utilizes the GPT-4-0314 API to synthesize vision-language interactions for the development of large multimodal models.
Use Cases
- Fine-tune vision-language models using the instruction-following pairs to improve multimodal task performance
- Train large multimodal models (LMMs) to interpret images based on the GPT-generated natural language instructions
- Benchmark open-source vision models against synthetic data generated by the GPT-4-0314 API
Strengths
- 150,000 multimodal instruction-following data points
- Generated using the GPT-4-0314 API in April 2023
- Formatted for visual instruction tuning of large multimodal models