Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
450 conversations designed to test whether persona framing gates the expression of alignment faking (AF) in the Gemma 3 27B-it language model. The dataset was created by author vincentoh and last updated on March 6, 2026. It includes 15 roles, 10 AF elicitation prompts, and 3 experimental conditions, with responses judged by Claude Opus.
License is listed as 'mit' in platform tags, but the specific license file or terms are not confirmed in the provided input.