Colonclip: Colonoscopy Image-Text Pairs for Medical CLIP Training

Name: Colonclip: Colonoscopy Image-Text Pairs for Medical CLIP Training
Creator: ZoeTAN
Published: 2026-06-01T05:51:59
Keywords: Image, Medical Imaging, Colonoscopy, Computer Vision, Video, Medical, Multimodal

by ZoeTANUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Colonclip is a multimodal training dataset for medical computer vision models, likely containing colonoscopy images paired with text. The dataset was uploaded by author ZoeTAN to Hugging Face and was last updated on June 3, 2026. It includes LMDB archives for training, label embeddings, and a separate testing set.

Use Cases

Training a medical CLIP model based on the described image-text pairs.
Fine-tuning computer vision models for colonoscopy image classification using the provided label embeddings.
Benchmarking multimodal retrieval systems on medical data using the separate test set.

Strengths

Includes a dedicated testing set (test_ex_adapt_collect.zip) for evaluation.
Provides label embeddings (label_en_dict.npy, label_en_graph.npz) which can facilitate structured learning.
Offers data in multiple formats (LMDB, JSON, TSV) for different processing needs.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: ZoeTAN on Hugging Face
Freshness: Last updated 2026-06-03 11:00:54; freshness should be verified.

The description notes that some files (train_texts.json, train_imgs.tsv) have an unclear encoding format and advises checking the code. Two CSV files are described as summaries for information only and are not used in the provided code.

Image Video Multimodal Medical Imaging Colonoscopy Computer Vision Medical

Related Datasets

Quality Score

D35

Description

36

Source

36

Reputation

38

Access

22

Community

7 downloads

1 likes

0 views

Dataset Info

Author: ZoeTAN
Created: Jun 1, 2026
Updated: Jun 3, 2026
Last synced: Jun 9, 2026

Access

22

Community

7 downloads

1 likes

0 views

Dataset Info

Author: ZoeTAN
Created: Jun 1, 2026
Updated: Jun 3, 2026
Last synced: Jun 9, 2026

Colonclip: Colonoscopy Image-Text Pairs for Medical CLIP Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info