Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
141 million interleaved image-text web documents containing 115 billion text tokens and 353 million images comprise the OBELICS collection. Created by Hugging Face and updated in 2024, it serves as a massive open-source resource for multimodal AI development.
Requires significant storage and compute resources to process 141M documents; licensed under Apache-2.0.