Name: LLM-Generated Autoimmune Hepatitis Patient Education: Readability and Quality Scores
Creator: Hanlu Li
Published: 2026-03-20T06:37:10
License: CC-BY-4.0
Keywords: Readability, Health Information Quality, Autoimmune Hepatitis, Large Language Models, Patient Education

Description

Featuring evaluation scores for patient education text generated by five large language models (ChatGPT, Doubao, DeepSeek, Wenxin Yiyan, Tongyi Qianwen) in response to 20 questions on autoimmune hepatitis. The text was assessed using multiple readability indices, the Global Quality Score (GQS), the Chinese Patient Education Materials Assessment Tool (C-PEMAT), and Clinical Intent Alignment (CIA). The study was authored by Hanlu Li and published in March 2026.

Use Cases

Compare Global Quality Score (GQS) and C-PEMAT scores across the five evaluated LLMs to identify top performers for patient education.
Analyze correlations between readability indices like the Automated Readability Index and GQS to understand the relationship between text complexity and perceived quality.
Assess Clinical Intent Alignment (CIA) coverage across different content themes, such as disease mechanisms and diagnostic processes.
Benchmark the performance of Chinese-language models (e.g., Wenxin Yiyan, Tongyi Qianwen) against international models using the provided evaluation scores.

Strengths

Multidimensional evaluation framework includes seven readability indices, GQS, C-PEMAT, and CIA scores.
Comparative analysis covers five distinct large language models (ChatGPT, Doubao, DeepSeek, Wenxin Yiyan, Tongyi Qianwen).
Content is based on 20 frequently asked patient questions across five thematic categories.
Evaluation is grounded in the 2025 EASL Clinical Practice Guidelines for clinical intent alignment.

Limitations

The dataset is small in scale, contained within a single 14.0 KB DOCX file, limiting extensive statistical analysis.
Focus is exclusively on autoimmune hepatitis, limiting generalizability to other medical conditions.
Sample data and raw text responses from the LLMs are unavailable, only the derived evaluation scores are presented.

Provenance

Source: figshare, authored by Hanlu Li.
Collection Method: Scores derived from evaluating LLM-generated text responses to 20 predefined questions, assessed by raters with inter-rater reliability measured by Cohen's kappa.
Freshness: Last updated March 2026.

Data is contained in a DOCX document; analysis requires extracting tables or text from this format. The license is CC BY 4.0.

Readability Health Information Quality Autoimmune Hepatitis Large Language Models Patient Education

LLM-Generated Autoimmune Hepatitis Patient Education: Readability and Quality Scores

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info