Toxic Dpo Natural V5

Name: Toxic Dpo Natural V5
Creator: adamo1139
Published: 2024-04-14T10:21:34
Keywords: Size Categories1 Kn10 K, Licenseother, Librarypolars, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Regionus, JSON

by adamo1139Updated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Three categories of preference data—toxid-dpo-natural-v4, rawrr v2-1 stage 2, and no_robots—comprise this merged dataset. The samples focus on human-like conversational responses to prevent models from overfitting to rigid instruction-following templates.

Use Cases

Train models using Direct Preference Optimization (DPO) to adopt a more natural tone based on the 'chosen' field
Execute Odds Ratio Preference Optimization (ORPO) to mitigate overfitting to specific instruction formats using the merged preference pairs
Fine-tune models like Yi 34B to be more open to answering by leveraging the human-like responses in the 'chosen' column

Strengths

Includes the 'chosen' field sourced from the original no_robots dataset
Aggregates samples from toxid-dpo-natural-v4 and rawrr v2-1 stage 2
Designed for compatibility with Yi 34B model training using the ORPO algorithm

JSON Size Categories1 Kn10 K Licenseother Librarypolars Modalitytext Librarymlcroissant Librarydatasets Librarypandas Regionus

Related Datasets

Quality Score

D30

Description

24

Source

36

Reputation

34

Access

22

Community

62 downloads

12 likes

0 views

Dataset Info

Author: adamo1139
Created: Apr 14, 2024
Updated: May 3, 2024

Access

22

Community

62 downloads

12 likes

0 views

Dataset Info

Author: adamo1139
Created: Apr 14, 2024
Updated: May 3, 2024

Toxic Dpo Natural V5

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info