Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A multi-turn safety evaluation benchmark assesses the safety capabilities of frontier large language models. The dataset consists of adversarial multi-turn conversations probing a model's ability to maintain safe behavior across categories like violence and hate. Created by CentificAIResearch, it was last updated on June 22, 2026.
License is unknown; check the dataset page for usage restrictions.