Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
CaptionEmporium provides 6.92 million captions for safe-for-work images from the e621/e926 platform, extending to January 2023. The dataset includes captions generated by a large language model (mistralai/Mistral-7B-v0.1) and a multimodal model (THUDM/CogVLM), with 8 LLM and 1 CogVLM caption per image. Most captions are described as substantially larger than 77 tokens.
License is unknown; terms of use must be verified before application.