5,040 text-image pairs across 13 safety scenarios including hate speech and illegal activities. The dataset provides a benchmark for evaluating the safety alignment of multimodal large language models. It specifically targets vulnerabilities in vision-language models through adversarial prompts.
Use Cases
- Evaluate multimodal model safety by feeding the 'image' and 'question' columns into a model and checking the output
- Perform error analysis across different safety domains using the 'category' column
- Analyze the effectiveness of safety prompts by comparing model responses to the 'question' and 'image' inputs
Strengths
- 5,040 samples across 13 safety-related scenarios
- Includes 'image', 'question', and 'category' fields
- Covers 13 scenarios including 'Illegal Activities', 'Hate Speech', and 'Malicious Software'
- Restricted to research use following GPT-4 and Stable Diffusion license agreements