Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Released by mvp-lab in 2025, this 85-million record multimodal collection supports the mid-training phase of the LLaVA-OneVision-1.5 framework. It aggregates image-text data from eight major sources including ImageNet-21k, LAIONCN, and SA-1B to facilitate democratized multimodal model training.
Requires high-performance storage for Parquet files; users should refer to Arxiv 2509.23661 for the specific data mixture ratios and training recipes.