Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ResearchClawBench is a benchmark that measures whether AI coding agents can independently conduct scientific research from reading raw data to producing publication-quality reports. The benchmark, created by InternScience, was last updated on May 19, 2026. It evaluates agent outputs against real human-authored papers.
License is unknown; terms of use must be verified before application.