Name: PKU-SafeRLHF Preference Data for AI Safety Research
Creator: PKU-Alignment
Published: 2023-06-14T16:03:29
Keywords: Task Categoriestext Generation, Safety, Librarypolars, Lm, Rlhf, Ai Safety, Languageen, Text Generation, Modalitytext, Size Categories100 Kn1 M, Modalitytabular, Librarymlcroissant, Librarydatasets, Librarypandas, Text, Tabular, Licensecc By Nc 40, Large Language Model, Regionus, JSON, Safe, Human Feedback, Arxiv240615513, Preference Modeling

Description

PKU-SafeRLHF is a dataset for AI safety research, particularly for reducing harmful outputs from language models. It was created by the PKU-Alignment Team and was last updated in October 2024. The dataset includes single-dimension preference data, question-answer pairs, and prompts.

Use Cases

Training reward models for Reinforcement Learning from Human Feedback (RLHF) using single-dimension preference labels.
Fine-tuning language models for safety via supervised learning on the provided question-answer pairs.
Benchmarking model harmlessness by evaluating responses to potentially harmful prompts.
Analyzing human preference patterns across different response attributes in the preference data.

Strengths

Dataset size is categorized as between 100K and 1M entries.
Includes multiple data components: prompts, Q-A pairs, and preference data for varied research approaches.
Explicitly designed for critical AI safety research on reducing model harm.

Limitations

Specific row counts, column names, and sample distributions are not provided.
Contains data that may be offensive or harmful, requiring careful handling.
License is CC BY-NC 4.0, restricting commercial use.

Provenance

Source: PKU-Alignment Team.
Collection Method: null
Time Range: null
Freshness: Last updated on 2024-10 18.
Geography: Region tag indicates US, but specific coverage is unknown.

License is Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0). Data contains potentially offensive or harmful content intended for safety research.

PKU-SafeRLHF Preference Data for AI Safety Research

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info