Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
WildClawBench is a benchmark containing 60 original tasks for evaluating AI agents within a live OpenClaw environment. It tests agents on end-to-end, practical work such as clipping football highlights and negotiating meeting times. The benchmark is multimodal, supporting languages including English and Chinese, and was created by internlm.
Full task descriptions and data are not included in this input; users must visit the provided Hugging Face dataset page. License is indicated as MIT in tags but not confirmed in the main description.