Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MINT-1T contains 1 trillion text tokens and 3.4 billion images, a tenfold scale increase from prior open-source multimodal collections. Created by a University of Washington team, this dataset interleaves text and images from sources including ArXiv papers and PDFs to support multimodal pretraining research.
Dataset is hosted on Hugging Face; users should check the specific page for license details, access terms, and download requirements for the large-scale files.