Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A verified version of the Terminal-Bench 2.0 dataset, last updated on 2026-04 28. It addresses identified environment and instruction issues, including updated Dockerfiles and runtime support for the Claude Code Agent. The dataset was created by author harithoppil and used to evaluate models like GLM-5 and Step 3.5-Flash.
License is unknown; users should verify permissions before use.