LLaVA Dataset: Vision-Language Data for Multimodal AI Training
Available on 1 platform
Sign in to view source links and access this dataset
Description
A dataset named LLaVA, hosted on Kaggle, likely contains multimodal data for training vision-language models. The platform tags suggest it is intended for large language model (LLM) training and multimodal AI tasks. Specific details on size, structure, and creation are not provided in the available metadata.
Use Cases
Fine-tuning a vision-language model for image captioning (inferred from domain, verify after download)
Training a model for visual question answering (VQA) (inferred from domain, verify after download)
Benchmarking multimodal model performance on instruction-following tasks (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science resources.