Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Joy Captioning 20250408A contains between 100,000 and 1,000,000 image-text pairs used for the initial training of the JoyCaption Beta One vision-language model. Created by fancyfeast and updated in early 2026, the collection focuses on detailed image descriptions and visual question-answering tasks. The data includes a mix of human-written and machine-generated text, explicitly labeled for provenance.
This dataset is the specific training source for JoyCaption Beta One; users should be aware that it is provided in Parquet format and requires the is_human filter to isolate non-synthetic responses.