Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
LLaVA-NeXT Data contains between 100,000 and 1,000,000 instruction-tuning pairs for multimodal large language models, released by lmms-lab in August 2024. It provides the specific data mixtures used to train the LLaVA-NeXT and LLaVA-NeXT (stronger) models, featuring synchronized image and text instruction sets.
The dataset is available in Parquet format on Hugging Face, but also includes a de-compressed raw format with JSON files and structured image folders for users familiar with the LLaVA data format.