Image-text pairs for Italian Contrastive Language–Image Pre-training (CLIP). This data aligns visual representations with Italian linguistic descriptions to support cross-modal retrieval and zero-shot classification.
Use Cases
- Train a text-to-image retrieval system using the image and Italian text pairs
- Evaluate zero-shot classification performance on Italian datasets
- Fine-tune vision-language models for Italian-specific semantic understanding
Strengths
- Includes image-text pairs with Italian language descriptions
- Supports contrastive learning between visual and Italian linguistic modalities
- Enables zero-shot classification using Italian natural language