Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Meisam Dastani published a dataset on figshare containing statistical test results for evaluating four large language models on medical questions. The dataset includes Kruskal–Wallis test results from an evaluation of ChatGPT, Gemini, Copilot, and Grok using the DISCERN-AI and NLAT-AI assessment tools. The data was last updated on 2026-05-11.
Data is provided in XLS format, requiring software like Microsoft Excel or a compatible spreadsheet tool to open.