Clawbench: Open-Source Benchmark for Browser AI Agents on 153 Live Web Tasks

Name: Clawbench: Open-Source Benchmark for Browser AI Agents on 153 Live Web Tasks
Creator: reacher-z
Published: 2026-04-10T01:59:17
License: Apache-2.0
Keywords: Evaluation, Benchmark, Ai Agents, Tabular, Web Automation

by reacher-zUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. The benchmark uses a 5-layer recording system, DOM-match, and an LLM judge for evaluation, with a reported top score of 33.3%. It was created by reacher-z and last updated on 2026-05-19.

Use Cases

Benchmarking the performance of browser-based AI agents based on the 153 defined tasks.
Evaluating agent robustness and generalization across 144 different live websites.
Developing new evaluation methods for web agents based on the 5-layer recording and LLM judge framework.

Strengths

Benchmarks performance on 153 distinct, everyday online tasks.
Evaluates agents across 144 live websites, providing real-world context.
Employs a multi-faceted evaluation method combining 5-layer recording, DOM-match, and an LLM judge.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset scale are unknown, which may limit suitability assessment.
The top reported score of 33.3% suggests the tasks present a significant challenge, indicating potential performance limitations for current agents.

Provenance

Source: github
Collection Method: Likely involves automated task recording and evaluation on live websites.
Freshness: Last updated 2026-05-19 05:11:06.

Released under the Apache-2.0 license.

Tabular Evaluation Benchmark Ai Agents Web Automation

Related Datasets

Quality Score

D32

Description

28

Source

19

Reputation

48

Access

61

Community

310 likes

0 views

Dataset Info

License: Apache-2.0
Author: reacher-z
Created: Apr 10, 2026
Updated: May 19, 2026
Language: Python
Last synced: May 19, 2026

Access

61

Community

310 likes

0 views

Dataset Info

License: Apache-2.0
Author: reacher-z
Created: Apr 10, 2026
Updated: May 19, 2026
Language: Python
Last synced: May 19, 2026

Clawbench: Open-Source Benchmark for Browser AI Agents on 153 Live Web Tasks

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info