LLM-Generated Autoimmune Hepatitis Patient Education: Readability and Quality Scores
by Hanlu Li·Updated 3mo ago
14.0 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Featuring evaluation scores for patient education text generated by five large language models (ChatGPT, Doubao, DeepSeek, Wenxin Yiyan, Tongyi Qianwen) in response to 20 questions on autoimmune hepatitis. The text was assessed using multiple readability indices, the Global Quality Score (GQS), the Chinese Patient Education Materials Assessment Tool (C-PEMAT), and Clinical Intent Alignment (CIA). The study was authored by Hanlu Li and published in March 2026.
Use Cases
Compare Global Quality Score (GQS) and C-PEMAT scores across the five evaluated LLMs to identify top performers for patient education.
Analyze correlations between readability indices like the Automated Readability Index and GQS to understand the relationship between text complexity and perceived quality.
Assess Clinical Intent Alignment (CIA) coverage across different content themes, such as disease mechanisms and diagnostic processes.
Benchmark the performance of Chinese-language models (e.g., Wenxin Yiyan, Tongyi Qianwen) against international models using the provided evaluation scores.
Strengths
Multidimensional evaluation framework includes seven readability indices, GQS, C-PEMAT, and CIA scores.
Comparative analysis covers five distinct large language models (ChatGPT, Doubao, DeepSeek, Wenxin Yiyan, Tongyi Qianwen).
Content is based on 20 frequently asked patient questions across five thematic categories.
Evaluation is grounded in the 2025 EASL Clinical Practice Guidelines for clinical intent alignment.
Limitations
The dataset is small in scale, contained within a single 14.0 KB DOCX file, limiting extensive statistical analysis.
Focus is exclusively on autoimmune hepatitis, limiting generalizability to other medical conditions.
Sample data and raw text responses from the LLMs are unavailable, only the derived evaluation scores are presented.
Provenance
Source
figshare, authored by Hanlu Li.
Collection Method
Scores derived from evaluating LLM-generated text responses to 20 predefined questions, assessed by raters with inter-rater reliability measured by Cohen's kappa.
Freshness
Last updated March 2026.
Data is contained in a DOCX document; analysis requires extracting tables or text from this format. The license is CC BY 4.0.