Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ATBench is a trajectory-level dataset for evaluating agentic safety in realistic, long-horizon interactions. It contains 500 annotated execution trajectories split evenly between safe and unsafe examples, with an average of 8.97 multi-turn interactions and 1,575 unique tools. The benchmark was created by AI45Research and was last updated in January 2026.
Full description requires visiting the external Hugging Face dataset page. Specific column names, data formats, and license details are not provided in the input.