Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Textual visual context for image captioning, building upon the publicly available COCO caption dataset. It includes updates from October 2023, featuring a SwinV2 classifier for generating visual caption cosine scores with person labels.
The full description is hosted externally; users should review the dataset page on Hugging Face for complete details, license, and access instructions.