Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
SkillTrustBench Results stores public leaderboard records for an AI safety benchmark. The dataset tracks two comparison groups: one fixing a model and comparing tools, and another fixing an analysis tool and comparing models. Raw system outputs are normalized into safety categories of normal (safe), suspicious, or malicious.
License is unknown; terms of use must be verified before application.