633,565 multimodal records of anime, manga, and game characters sourced from 3,860 Fandom wiki sites. The dataset pairs character images with metadata extracted from HTML and descriptive captions generated by the Qwen-VL-72B-Instruct vision-language model.
Use Cases
- Fine-tune text-to-image models using the VLM-generated captions to improve character-specific generation
- Train character-based conversational agents using the personality descriptions and wiki metadata
- Conduct large-scale visual-textual alignment research using the paired images and HTML-derived metadata
Strengths
- 633,565 unique character records spanning anime, manga, and video games
- Aggregated content from 3,860 distinct Fandom wiki sites
- Captions generated by Qwen-VL-72B-Instruct covering visual appearance and inferred personality