ADHD Information Questions: LLM Response Accuracy and Readability Scores
by Xingmin Han·Updated 4d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
30 responses each from three large language models (ChatGPT 5, DeepSeek V3, Grok 4) evaluated for ADHD-related content. The dataset contains scores for content accuracy, readability (FKRE, FKGL, SMOG), lexical complexity, and response stability. Author Xingmin Han published the data on figshare in June 2026.
Use Cases
Compare content accuracy across different LLMs based on the described evaluation of ADHD definitions and symptoms.
Analyze readability scores (FKRE, FKGL, SMOG) to assess the suitability of generated text for general audiences.
Evaluate response stability across multiple model runs as described in the study's primary endpoints.
Benchmark lexical complexity of educational content generated by different AI models.
Strengths
Includes 30 responses per model, allowing for stability analysis.
Readability assessed using three established metrics: Flesch-Kincaid Reading Ease, Flesch-Kincaid Grade Level, and SMOG.
Models were prompted with identical inputs related to ADHD definitions, symptoms, and medication-exercise interactions.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small at 5.5 KB, indicating limited scope.
Provenance
Source
figshare
Collection Method
Systematic comparison of three LLMs using identical prompts; 30 responses per model were collected.
Freshness
Last updated 2026-06-01 17:32:21; freshness should be verified.
Data is in XLS format; requires software capable of reading Excel files.