Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
DenseFusion-1M provides 1 million image-text pairs for multi-modal perception, released by the Beijing Academy of Artificial Intelligence (BAAI) in 2024. The dataset uses a Perceptual Fusion approach to combine outputs from specialized vision experts and GPT-4V into detailed descriptions.
Licensed under CC BY 4.0 and provided in JSON format; requires standard Python data libraries like pandas or polars for ingestion.