DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Autoresearch Novelty Bench: AI Agent Hypothesis Benchmark | DataSalon

Home Mathematics & StatisticsAutoresearch Novelty Bench: AI Agent Hypothesis Benchmark

Mathematics & Statistics

Autoresearch Novelty Bench: AI Agent Hypothesis Benchmark

Name: Autoresearch Novelty Bench: AI Agent Hypothesis Benchmark
Creator: evo-hq
Published: 2026-05-17T16:52:53
Keywords: Hypothesis Generation, Ai Research, Benchmark, Tabular, Autonomous Agents

by evo-hq·Updated 2mo ago

Available on 1 platform

Description

A benchmark dataset for testing whether autonomous AI research agents propose novel, mechanism-distinct hypotheses. It contains 10,380 rows of experimental training runs built on Prime Intellect's autonomous-speedrunning archive. The dataset was created by Evo and last updated on May 17, 2026.

Use Cases

Benchmarking AI research agent novelty based on the described experimental training runs.
Evaluating hypothesis generation distinctness using the described mechanism-distinct criteria.
Analyzing autonomous agent performance in optimization speedruns as described in the source archive.

Strengths

10,380 rows of experimental training runs provide a substantial testbed.
Built on a specific, documented archive from Prime Intellect's autonomous-speedrunning project.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count beyond the 10,380 experiments is unknown, which may limit suitability assessment.

Provenance

Source: Prime Intellect's autonomous-speedrunning archive.
Collection Method: Built from two AI agents (Claude Code and Codex) competing on modded-nanogpt's optimization speedrun.
Freshness: Last updated 2026-05-17 17:09:32.

License is unknown; restrictions should be verified before use.

Tabular Hypothesis Generation Ai Research Benchmark Autonomous Agents

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

1.0K downloads

1 likes

0 views

Dataset Info

Author: evo-hq
Created: May 17, 2026
Updated: May 17, 2026
Last synced: Jun 11, 2026

Access

Community

1.0K downloads

1 likes

0 views

Dataset Info

Author: evo-hq
Created: May 17, 2026
Updated: May 17, 2026
Last synced: Jun 11, 2026

Autoresearch Novelty Bench: AI Agent Hypothesis Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info