LLM Responses on Under-5 Mortality Queries Evaluated by Pediatricians
by Yi Yang·Updated 1mo ago
24.2 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Yi Yang's research dataset on figshare contains the evaluation results of four large language models' responses to 25 public queries about the top five causes of under-5 mortality. The dataset includes scores for reliability, accuracy, completeness, comprehensibility, readability, and actionability, generated using tools like DISCERN, Likert scales, and PEMAT-P. It was last updated on May 5, 2026.
Use Cases
Benchmark LLM performance on child health information based on DISCERN reliability scores.
Compare readability of AI-generated health advice based on Flesch-Kincaid Grade Level scores.
Analyze the actionability of LLM responses for public guidance based on PEMAT-P scores.
Identify strengths and weaknesses of specific LLMs (ChatGPT-4.0, Claude 3.5 Sonnet, Bing AI, Gemini) across multiple evaluation metrics.
Strengths
Evaluations are based on 25 representative public queries derived from Google Trends.
Responses were independently scored by four pediatricians using established instruments.
Performance differences among four LLMs were statistically tested (p < 0.05).
Limitations
The dataset is a 24.2 KB DOCX file, suggesting limited scope and likely containing summary results rather than raw data.
Column-level documentation is absent; field semantics must be inferred after download.
The data reflects a specific evaluation study; its applicability to other health topics or LLMs is unknown.
Provenance
Source
Yi Yang
Collection Method
LLM responses were collected and evaluated by pediatricians using standardized tools.
Freshness
Last updated 2026-05-05 05:23:58; freshness should be verified.
License is CC-BY-4.0. The file format is DOCX, which may require specific software for viewing.