Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A PyTorch-based implementation of the OpenAI CLIP architecture for image-text alignment, authored by Moein Shariatnia and updated in October 2025. It provides a dual-encoder framework for processing image-text pairs using BERT for natural language processing and Vision Transformer components.
Users must provide their own image-caption datasets (e.g., MS-COCO or Flickr8k) as the repository contains the model implementation and training logic but no raw data records.