Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
CUAVerifierBench is an evaluation benchmark for verifiers of computer-using agents, created by Microsoft. It contains human-annotated trajectories of agent interactions to judge task completion. The dataset was last updated on 2026-04-21.
License is unknown; terms of use must be verified before application.