Sign in to view source links and access this dataset
Description
300,000 examples of visual instruction data for training multimodal large language models. The dataset combines 150,000 English examples from the LLaVA project and 150,000 from the openbmb project. Author BUAADreamer uploaded this collection to Hugging Face on September 2, 2024.
Use Cases
Fine-tuning vision-language models based on the described multimodal instruction examples.
Training models for visual question answering based on the instruction data.
Benchmarking model performance on multimodal instruction-following tasks.
Conducting research on instruction tuning for multimodal AI systems.
Strengths
Contains 300,000 total examples, providing a substantial volume of training data.
Combines data from two established sources: LLaVA and openbmb.
Specifically formatted for use with the LLaMA Factory training toolkit.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the contributing projects LLaVA and openbmb.
Provenance
Source
Combined from the LLaVA and openbmb projects.
Collection Method
Likely curated and aggregated from existing visual instruction datasets.
Time Range
null
Freshness
Last updated 2024-09-02 14:20:59; freshness should be verified.
Geography
null
License is unknown; restrictions must be verified before use.