DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Autoresearch HITL Annotations: Human Evaluations of DPO Models | DataSalon

Home Machine LearningAutoresearch HITL Annotations: Human Evaluations of DPO Models

Machine Learning

Autoresearch HITL Annotations: Human Evaluations of DPO Models

Name: Autoresearch HITL Annotations: Human Evaluations of DPO Models
Creator: ProlificAI
Published: 2026-05-21T13:30:50
Keywords: Annotations, Human In The Loop, Model Evaluation, Tabular, Llm Research

by ProlificAI·Updated 1mo ago

Available on 1 platform

Description

1,507 human annotations from a study titled 'When does autoresearch need a human?'. ProlificAI collected these evaluations from 300 participants assessing models generated by Karpathy's autoresearch on a DPO task. The dataset includes per-pair statistics, Bradley-Terry rankings, and LLM-clustered comment themes.

Use Cases

Compare model performance based on Bradley-Terry rankings derived from human annotations.
Analyze qualitative feedback themes from LLM-clustered participant comments.
Study the effectiveness of automated research pipelines by evaluating the human annotations they require.

Strengths

Dataset contains 1,507 individual annotation rows.
Annotations were collected from 300 distinct human participants via Prolific.
Includes multiple analysis outputs like Bradley-Terry rankings and clustered comment themes.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but other scale details like file size and formats are unknown.

Provenance

Source: ProlificAI
Collection Method: Collected via the Prolific platform from 300 participants as part of a case study evaluating Karpathy's autoresearch.
Freshness: Last updated 2026-05-21 13:37:51; freshness should be verified.

Full description and data files are only available on the Hugging Face dataset page.

Tabular Annotations Human In The Loop Model Evaluation Llm Research

Related Datasets

Quality Score

C43

Description

Source

Reputation

Quality Score

C43

Description

Source

Reputation

Access

Community

42 downloads

1 likes

0 views

Dataset Info

Author: ProlificAI
Created: May 21, 2026
Updated: May 21, 2026
Last synced: Jun 2, 2026

Access

Community

42 downloads

1 likes

0 views

Dataset Info

Author: ProlificAI
Created: May 21, 2026
Updated: May 21, 2026
Last synced: Jun 2, 2026

Autoresearch HITL Annotations: Human Evaluations of DPO Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info