Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A benchmark dataset for testing whether autonomous AI research agents propose novel, mechanism-distinct hypotheses. It contains 10,380 rows of experimental training runs built on Prime Intellect's autonomous-speedrunning archive. The dataset was created by Evo and last updated on May 17, 2026.
License is unknown; restrictions should be verified before use.