Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
20 standardized patient-centered questions across six clinical domains were used to evaluate three large language models (Deepseek, ChatGPT, Gemini). Responses were graded for accuracy and comprehensiveness by three clinician researchers, with results including mean word counts and performance ratings. The dataset, authored by Tao Huang and last updated in March 2026, presents the foundational assessment findings.
Primary data is contained in a DOCX file, which may require conversion for programmatic analysis.