Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A human-annotated preference dataset for RLHF and Direct Preference Optimization (DPO), focused on AI ethics failure modes. It contains 95 prompts and 190 response pairs, with full annotation across five dimensions. The dataset was created by AI ethics specialist Mandy Hathaway and last updated on 2026-04-13.
License is unknown; restrictions must be checked before use.