Toxic-DPO: Preference Data for Model Unalignment

Name: Toxic-DPO: Preference Data for Model Unalignment
Creator: tastypear
Published: 2024-01-28T07:55:35
Keywords: Model Unalignment, Text, Multilingual, Dpo, Preference Optimization

by tastypearUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Unalignment Toxic Dpo V0.2 Zh Cn is a multilingual dataset intended to illustrate the use of Direct Preference Optimization (DPO) for model unalignment. The dataset was created by tastypear and last updated on 2024-01-31. Its description states it contains highly toxic or harmful examples.

Use Cases

Training preference models for content moderation based on the described toxic examples.
Studying the effects of DPO on model behavior using the described harmful preference pairs.
Benchmarking model alignment techniques against adversarial preference data.
Creating multilingual adversarial test sets for safety evaluation based on the Chinese-English content.

Strengths

Dataset is explicitly designed for a specific machine learning technique (DPO).
Provides a Chinese-English parallel version, as stated in the description.
Last update timestamp is precisely recorded as 2024-01-31 13:57:28.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: huggingface user tastypear
Collection Method: Likely generated or curated for DPO experiments; Chinese version created via model paraphrasing.
Freshness: Last updated 2024-01-31 13:57:28; freshness should be verified.

Usage restrictions from the original dataset apply. The Chinese translations are model-paraphrased and may not be accurate.

Text Multilingual Model Unalignment Dpo Preference Optimization

Related Datasets

Quality Score

D36

Description

42

Source

41

Reputation

22

Access

26

Community

17 downloads

21 likes

0 views

Dataset Info

Author: tastypear
Created: Jan 28, 2024
Updated: Jan 31, 2024
Last synced: Jun 7, 2026

Access

26

Community

17 downloads

21 likes

0 views

Dataset Info

Author: tastypear
Created: Jan 28, 2024
Updated: Jan 31, 2024
Last synced: Jun 7, 2026

Toxic-DPO: Preference Data for Model Unalignment

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info