Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Auto-ClawEval is an auto-generated benchmark for evaluating AI agents, containing 1,040 tasks across 104 unique scenarios. It was created by ClawEnvKit and published by AIcell on Hugging Face, with a last update timestamp of 2026-04-21. The tasks are a mix of API-based (77%) and file-dependent (23%) types, spanning 24 categories and involving 20 mock services.
Evaluation requires the ClawEnvKit Docker harness, as indicated in the quick start instructions.