Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
GPT-4o and Gemini 2.5 Pro were evaluated for extracting PI-RADS v2.1 scores from free-text prostate MRI reports, comparing their performance with three radiologists of varying experience. Inter-rater agreement between human experts was highest (Gwet's AC1=0.68), while agreement between LLMs was lower (AC1=0.52). The dataset likely contains the processed reports and the assigned scores from both LLMs and human readers.
The primary file format is DOCX, which may require specific tools for parsing.