Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
665,000 multimodal instruction-following pairs consisting of images and text sequences, compiled by kaiyuyue and updated in 2025. This collection consolidates the LLaVA-1.5-665K mixture into a single repository, providing raw images in WebDataset format alongside instruction JSONs.
Images are stored in WebDataset format (.tar files) requiring specific loading libraries like webdataset; the main instruction file is llava_v1_5_mix665k.json.