Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
OBELICS is a massive, curated collection of 141 million English web documents containing 115 billion text tokens and 353 million images. The documents feature interleaved text paragraphs and images, extracted from Common Crawl dumps. It was created by HuggingFaceM4 and released in August 2023.
License is unknown; users should verify terms before use.