Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MINT-1T is an open-source multimodal interleaved dataset containing one trillion text tokens and 3.4 billion images, representing a 10x scale-up from prior open-source collections. It was created by a team from the University of Washington to facilitate research in multimodal pretraining. The dataset was last updated on the platform in September 2024.
License is listed as 'cc By 40' on the platform, but specific terms are not detailed in the provided input.