MedLayBench-V provides 79,789 medical image-text pairs across 7 imaging modalities. Each image is paired with both a clinical expert caption and a patient-friendly layman caption. The dataset, created by hanjang, was released in April 2026.
Use Cases
- Training models to generate patient-friendly captions from clinical expert captions using the paired text fields.
- Benchmarking model performance on expert-lay semantic alignment across the 7 imaging modalities.
- Evaluating the factual accuracy of medical image captioning systems using the expert-provided ground truth.
- Fine-tuning models for medical visual question answering by leveraging the detailed image-text pairs.
Strengths
- 79,789 image-text pairs provide substantial scale for training and evaluation.
- Covers 7 distinct medical imaging modalities for broad applicability.
- Each pair includes two distinct caption types (expert and layman) enabling alignment studies.
Limitations
- Specific row counts, column details, and sample data are unavailable for inspection.
- The geographic and temporal coverage of the underlying medical images is not specified.
- Potential class imbalance across the 7 imaging modalities is not detailed.
Provenance
- Source
- Built on the ROCOv2 dataset.
- Collection Method
- Provides paired expert and layman captions for medical images.
- Time Range
- null
- Freshness
- Last updated April 2026.
- Geography
- null