Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Agent execution trajectories from a large-scale agentic evaluation. Each trajectory captures a single (agent, model) attempt at a task, including step logs, tool calls, model outputs, and verifier scoring. The dataset was authored by kendx and last updated on 2026-06-08.
License is unknown, which may restrict usage.