Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A subset of the LAION/CC/SBU dataset filtered for more balanced concept coverage distribution, constructed for the pretraining stage of visual instruction tuning. It contains synthetic captions generated by BLIP for reference and aims to build large multimodal models towards GPT-4 vision/language capability. The dataset was created by liuhaotian and last updated in July 2023.
Full dataset details, including columns, sample data, file formats, size, and license, are not provided in the input and must be obtained from the dataset page.