Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Terminal-Bench 2.0 Verified is a corrected version of a benchmark for evaluating AI code agents, addressing identified environment and instruction issues. The dataset was reviewed and modified by the organization zai-org, with the verified version released in February 2026. It includes updated Dockerfiles and instructions specifically to support the runtime of the Claude Code Agent.
Full description, including details on fixes and usage, is hosted externally on the Hugging Face dataset page. License information is not specified in the provided input.