Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
The Public Multimodal Dataset (PMD) contains 70 million image-text pairs with 68 million unique images. It was introduced in the FLAVA paper and aggregated from publicly-available sources including Conceptual Captions, WIT, Localized Narratives, RedCaps, COCO, SBU Captions, Visual Genome, and a subset of YFCC100M.
The dataset is a large-scale aggregation; users should review the licenses and terms of the original constituent datasets (e.g., Conceptual Captions, COCO, YFCC100M) for specific usage restrictions. The specific data schema and file formats are not detailed in the provided input.