DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

PathGen-1.6M: Pathology Image-text Pairs Generated via Multi-agent Collaboration | DataSalon

Home Multimodal & LLMPathGen-1.6M: Pathology Image-text Pairs Generated via Multi-agent Collaboration

Multimodal & LLM

PathGen-1.6M: Pathology Image-text Pairs Generated via Multi-agent Collaboration

Name: PathGen-1.6M: Pathology Image-text Pairs Generated via Multi-agent Collaboration
Creator: jamessyx
Published: 2024-06-13T10:54:59
Keywords: Image Text Pairs, Librarypolars, Size Categories1 Mn10 M, Medical Imaging, Vision Language Models, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Licensecc By 40, Computer Vision, Arxiv240700203, Regionus, Pathology, Large Scale, Video, JSON, Multimodal

by jamessyx·Updated 1y ago

Available on 1 platform

Description

April 2025 is the last update date for this dataset of 1.6 million pathology image-text pairs. It was created by jamessyx and is intended for training Vision Language Models (VLMs) like CLIP. The dataset is designed to support applications in pathology, such as zero-shot image classification and Whole Slide Image analysis.

Use Cases

Training Vision Language Models (VLMs) like CLIP based on pathology image-text pairs.
Zero-shot pathology image classification based on the generated image-text pairs.
Whole Slide Image (WSI) analysis based on the pathology image-text pairs.
Developing pathology-specific vision encoders for large language models based on the multimodal data.

Strengths

Contains 1.6 million pathology image-text pairs.
Designed specifically for training Vision Language Models (VLMs) in pathology.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: jamessyx
Collection Method: Generated through multi-agent collaboration.
Freshness: Last updated 2025-04-22 04:25:53

Video Multimodal JSON Image Text Pairs Librarypolars Size Categories1 Mn10 M Medical Imaging Vision Language Models Modalitytext Librarymlcroissant Librarydatasets Librarypandas Licensecc By 40 Computer Vision Arxiv240700203 Regionus Pathology Large Scale

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

67 downloads

14 likes

0 views

Dataset Info

Author: jamessyx
Created: Jun 13, 2024
Updated: Apr 22, 2025
Last synced: Apr 20, 2026

Access

Community

67 downloads

14 likes

0 views

Dataset Info

Author: jamessyx
Created: Jun 13, 2024
Updated: Apr 22, 2025
Last synced: Apr 20, 2026

PathGen-1.6M: Pathology Image-text Pairs Generated via Multi-agent Collaboration

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info