DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

CUAVerifierBench: A Human-Annotated Benchmark for Computer-Using Agent Verifiers | DataSalon

Home Robotics & Autonomous SystemsCUAVerifierBench: A Human-Annotated Benchmark for Computer-Using Agent Verifiers

Robotics & Autonomous Systems

CUAVerifierBench: A Human-Annotated Benchmark for Computer-Using Agent Verifiers

Name: CUAVerifierBench: A Human-Annotated Benchmark for Computer-Using Agent Verifiers
Creator: microsoft
Published: 2026-04-19T02:13:35
Keywords: Ai Evaluation, Computer Using Agents, Benchmark, Verification Benchmark, Human Annotated, Multimodal

by microsoft·Updated 3mo ago

Available on 1 platform

Description

CUAVerifierBench is an evaluation benchmark for verifiers of computer-using agents, created by Microsoft. It contains human-annotated trajectories of agent interactions to judge task completion. The dataset was last updated on 2026-04-21.

Use Cases

Training verifier models based on human judgments of agent trajectories.
Benchmarking the performance of different verifier architectures on a standardized task.
Studying the failure modes of computer-using agents based on annotated incorrect trajectories.

Strengths

Dataset is human-annotated, which likely provides a reliable ground truth for evaluation.
Focuses on a specific and emerging evaluation need for computer-using agent verifiers.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: Microsoft
Collection Method: Human-annotated trajectories of agent interactions, as described in the associated 'Universal Verifier' paper.
Freshness: Last updated 2026-04-21 19:30:05; freshness should be verified.

License is unknown; terms of use must be verified before application.

Multimodal Ai Evaluation Computer Using Agents Benchmark Verification Benchmark Human Annotated

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

62 downloads

1 likes

0 views

Dataset Info

Author: microsoft
Created: Apr 19, 2026
Updated: Apr 21, 2026
Last synced: Jun 23, 2026

Access

Community

62 downloads

1 likes

0 views

Dataset Info

Author: microsoft
Created: Apr 19, 2026
Updated: Apr 21, 2026
Last synced: Jun 23, 2026

CUAVerifierBench: A Human-Annotated Benchmark for Computer-Using Agent Verifiers

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info