Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
PIN-200M contains approximately 200 million samples of paired and interleaved multimodal documents, requiring around 312 terabytes of storage. The dataset is a mini version of the PIN dataset introduced in a paper from June 2024. It was created by author m-a-p and last updated on Hugging Face in April 2026.
License is unknown, which may impose restrictions on commercial use or redistribution.