Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1.432 million image-QA instances developed by wentao-yuan in 2024 facilitate fine-tuning Vision-Language Models for spatial affordance prediction. The collection integrates 667K synthetic instances for object and free space referencing with 100K LVIS detection samples and 150K instruction-following pairs.
Data is provided in WebDataset format; users should ensure compatibility with webdataset or mlcroissant libraries for efficient data ingestion.