Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
PMC-VQA contains 227,000 visual question-answering pairs associated with 149,000 medical images sourced from PubMed Central. Released by RadGenome and updated in July 2024, the collection includes a specialized version focused on noncompound images to facilitate cleaner model training. The dataset is organized into training and testing splits with a dedicated clean test set for benchmarking.
Users should distinguish between version 1 and version 2; version 2 (train2.csv and images2.zip) is recommended for tasks requiring noncompound images. The dataset is distributed via Hugging Face and requires unzipping large image archives.