Description

GPT-4o and Gemini 2.5 Pro were evaluated for extracting PI-RADS v2.1 scores from free-text prostate MRI reports, comparing their performance with three radiologists of varying experience. Inter-rater agreement between human experts was highest (Gwet's AC1=0.68), while agreement between LLMs was lower (AC1=0.52). The dataset likely contains the processed reports and the assigned scores from both LLMs and human readers.

Use Cases

Benchmark LLM performance for medical information extraction based on PI-RADS score assignment
Analyze inter-rater agreement metrics in clinical text interpretation based on Gwet's AC1 coefficients
Compare diagnostic performance (sensitivity, specificity, AUC) between AI and human readers
Study report standardization and trainee education tools based on the supplementary tool potential mentioned

Strengths

Specific performance metrics are provided, including AUC values for three human readers (0.81, 0.86, 0.89) and two LLMs (0.85, 0.84)
Inter-rater agreement analysis uses Gwet's AC1 coefficient with 95% confidence intervals
The study compares three human readers with distinct experience levels (resident, fellow, expert)

Limitations

Row count is unknown, which may limit suitability assessment
Column-level documentation is absent; field semantics must be inferred after download
The dataset is 19.1 KB, indicating a very limited scope

Provenance

Source: figshare
Collection Method: Three radiologists independently reviewed reports and assigned scores; reports were processed through prompts with GPT-4o and Gemini 2.5 Pro.
Freshness: Last updated 2026-04-10 09:36:02; freshness should be verified

The primary file format is DOCX, which may require specific tools for parsing.

Text Prostate Cancer Medical Imaging Llm Evaluation Healthcare Clinical Text Radiology

PI-RADS Score Extraction from Prostate MRI Reports: LLM vs. Human Reader Comparison

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info