Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VLMSafe-420 consists of 420 multimodal counterfactual pairs across 38 safety categories, developed by ArthT and updated in March 2026. The data is designed for mechanistic interpretability research to identify and analyze safety circuits within Vision-Language Models.
The dataset is licensed under the MIT license and is provided in Parquet format.