Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
4,000,000 image-caption pairs stored in PyArrow IPC format for high-performance multimodal training. The dataset utilizes memory-mapped files to enable low-latency data access during large-scale model optimization.