svjack created this dataset to train a Pokémon text-to-image model. It pairs Pokémon images from the FastGAN project with captions generated by the BLIP model, adding a Chinese translation column. The dataset was last updated on Hugging Face in October 2022.
Use Cases
- Training text-to-image generation models based on the described Pokémon image-caption pairs.
- Fine-tuning multilingual captioning models based on the parallel English and Chinese captions.
- Studying GAN-generated image quality and caption alignment based on the source from the FastGAN project.
- Benchmarking cross-lingual vision-language models on a niche, stylized domain.
Strengths
- Includes parallel English and Chinese captions for each image, enabling multilingual applications.
- Images are sourced from a published GAN research project (FastGAN), providing a coherent visual domain.
- Captions were generated using the pre-trained BLIP model, a recognized vision-language model.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Last updated 2022-10-31 06:23:03; freshness should be verified.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- Pokémon images from the FastGAN-pytorch project; captions generated by the BLIP model.
- Collection Method
- Images were obtained from a GAN training dataset and captioned with a pre-trained model.
- Time Range
- null
- Freshness
- Last updated 2022-10-31 06:23:03.
- Geography
- null