Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1,507 human annotations from a study titled 'When does autoresearch need a human?'. ProlificAI collected these evaluations from 300 participants assessing models generated by Karpathy's autoresearch on a DPO task. The dataset includes per-pair statistics, Bradley-Terry rankings, and LLM-clustered comment themes.
Full description and data files are only available on the Hugging Face dataset page.