DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

QwenClawBench: 100-Task Benchmark for Evaluating OpenClaw AI Agents | DataSalon

Home Government & LegalQwenClawBench: 100-Task Benchmark for Evaluating OpenClaw AI Agents

Government & Legal

QwenClawBench: 100-Task Benchmark for Evaluating OpenClaw AI Agents

Name: QwenClawBench: 100-Task Benchmark for Evaluating OpenClaw AI Agents
Creator: skylenage-ai
Published: 2026-04-08T16:29:58
Keywords: Openclaw, Real User Distribution, Benchmark, Text, Ai Agent Evaluation, Synthetic

by skylenage-ai·Updated 2mo ago

Available on 1 platform

Description

100 tasks across 8 core domains comprise this benchmark for evaluating OpenClaw agents. It was originally developed as an internal benchmark for Qwen3.6-Plus and later open-sourced by skylenage-ai. The dataset was last updated on April 10, 2026.

Use Cases

Benchmarking agent performance based on the 100 tasks across 8 domains
Evaluating agent robustness at scale based on the real-user-distribution design
Testing agent capabilities in isolated simulated workspaces as described
Comparing agent architectures using the standardized tasks

Strengths

Contains 100 tasks for evaluation
Covers 8 core domains for broad assessment
Designed for robust evaluation at scale
Features isolated simulated workspaces per task

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: skylenage-ai
Collection Method: Originally built as an internal benchmark during the development of Qwen3.6-Plus, then optimized and open-sourced.
Time Range: null
Freshness: Last updated 2026-04-10 04:16:45; freshness should be verified
Geography: null

null

Text Openclaw Real User Distribution Benchmark Ai Agent Evaluation Synthetic

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

1 likes

0 views

Dataset Info

Author: skylenage-ai
Created: Apr 8, 2026
Updated: Apr 10, 2026
Last synced: Jun 9, 2026

Access

Community

1 likes

0 views

Dataset Info

Author: skylenage-ai
Created: Apr 8, 2026
Updated: Apr 10, 2026
Last synced: Jun 9, 2026

QwenClawBench: 100-Task Benchmark for Evaluating OpenClaw AI Agents

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info