Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Amshaker's dataset provides 9 million text-image pairs for the first-stage pre-training of the Mobile-O multimodal model. The data is intended to align a diffusion decoder and conditioning projector with a frozen vision-language backbone. The dataset was last updated on Hugging Face in February 2026.
License is listed as 'cc-by-nc-40' on the platform, indicating a Creative Commons Attribution-NonCommercial 4.0 license.