Benchmarking LLM Responses to IBD Patient Questions, 20 Questions and 5 Models
by Xiaoyue Wang·Updated 2mo ago
50.1 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
A cross-sectional benchmark study from January 17–24, 2026, evaluated five publicly available large language models on 20 patient-facing inflammatory bowel disease questions, producing 100 model–question responses. The dataset contains scores for informational quality, transparency proxies, and readability, assessed using DISCERN, EQIP, JAMA criteria, and six readability indices. The work was authored by Xiaoyue Wang and shared under a CC-BY-4.0 license.
Use Cases
Benchmarking the informational quality of LLM outputs for medical questions based on DISCERN and EQIP scores.
Evaluating transparency and disclosure in AI-generated health content based on JAMA benchmark criteria.
Assessing the readability of patient-facing AI responses based on six automated readability indices.
Comparing performance across different LLM models on a standardized set of clinical questions.
Strengths
Dataset is based on a structured benchmark using 20 guideline-derived questions across the IBD care pathway.
High interrater agreement reported for scoring, with ICC values ranging from 0.760 to 0.842 and weighted kappa up to 0.936.
All 10 measured outcomes showed statistically significant variation across models (Holm-adjusted P < 0.001).
Limitations
Row count and column-level documentation are absent; field semantics must be inferred after download.
The dataset is very small (50.1 KB), indicating limited scope and likely summary-level data.
The file format is DOCX, which may require conversion for programmatic analysis.
Provenance
Source
figshare
Collection Method
Queries were conducted via official LLM web interfaces under default settings, with responses evaluated by two blinded raters.
Time Range
Queries conducted from January 17–24, 2026.
Freshness
Last updated 2026-04-10 05:57:46; freshness should be verified.
Geography
null
Data is provided in a DOCX file format, which may not be directly machine-readable.