Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A 2026 study by Yinuo Liu compares the performance of three large language models (ChatGPT-5, Gemini-3-Pro, Claude-Sonnet-4.5) against human reviewers in evaluating 160 conference abstracts. The research assesses inter-rater reliability and systematic bias using statistical methods like intraclass correlation coefficients and Bland-Altman plots. The dataset, shared under a CC-BY-4.0 license, contains the results of this analysis in a 211.4 KB document.
Primary data is embedded within a DOCX analysis document; raw tabular data is not provided in a separate, machine-readable format.