Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A 9.5 KB Excel file contains statistical agreement metrics for evaluating large language model outputs against a human-adjudicated panel. Author Callum Hill published the dataset in April 2026. It reports mean values with 95% bootstrap confidence intervals for six inter-rater reliability and classification metrics.
Data is in XLS (Excel) format. The platform tags suggest the context involves 'Context Complicate Coding' and 'Define Llm Hallucination', indicating the metrics are for a specific LLM evaluation task in qualitative analysis.