Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Toxic-DPO v0.2 is a dataset created by 'unalignment' to illustrate the use of Direct Preference Optimization for de-aligning language models. It contains a collection of text examples labeled as toxic or harmful, including profanity. The dataset was uploaded to Hugging Face on January 9, 2024.
Usage requires explicit acknowledgment that the data contains toxic, harmful, and profane content. A full description and usage restrictions are available on the Hugging Face dataset page.