Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
PVIT-3M is a dataset of 3 million image-text pairs designed for tuning Multimodal Large Language Models (MLLMs) on personalized visual instruction tasks. It was created by Sterzhang and introduced in the paper "Personalized Visual Instruction Tuning". The dataset was last updated on November 2, 2024.
License is unknown; users should verify permissions before use.