DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

PKU-SafeRLHF-RLHF: 37,000 Reward Model Training Examples | DataSalon

Home Multimodal & LLMPKU-SafeRLHF-RLHF: 37,000 Reward Model Training Examples

Multimodal & LLM

PKU-SafeRLHF-RLHF: 37,000 Reward Model Training Examples

Name: PKU-SafeRLHF-RLHF: 37,000 Reward Model Training Examples
Creator: AIPlans
Published: 2026-05-04T16:16:05
Keywords: Alignment, Text Generation, Text, Reinforcement Learning, Human Feedback, Reward Model

by AIPlans·Updated 2mo ago

Available on 1 platform

Description

AIPlans provides a dataset of 37,022 text examples formatted for reinforcement learning from human feedback (RLHF). The dataset, derived from PKU-Alignment/PKU-SafeRLHF, includes 33,334 training and 3,688 test examples. It was last updated on 2026-05-04.

Use Cases

Train a reward model based on the 'prompt', 'chosen', and 'rejected' text fields.
Perform supervised fine-tuning (SFT) based on the 'prompt' and 'chosen' text pairs.
Conduct direct preference optimization (DPO) based on the 'prompt', 'chosen', and 'rejected' text triples.
Benchmark safety-aligned reward models against the provided test split.

Strengths

Provides 37,022 total examples ready for RLHF training.
Includes a defined train/test split with 33,334 and 3,688 examples respectively.
Data is structured for easy conversion to SFT and DPO formats without re-downloading.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known but specific data content and filtering criteria require visiting the source page.
License and original data collection methodology are not specified in the provided metadata.

Provenance

Source: Derived from PKU-Alignment/PKU-SafeRLHF.
Freshness: Last updated 2026-05-04 16:16:22; freshness should be verified.

License restrictions are unknown and should be verified before use.

Text Alignment Text Generation Reinforcement Learning Human Feedback Reward Model

Related Datasets

Quality Score

C41

Description

Source

Reputation

Quality Score

C41

Description

Source

Reputation

Access

Community

156 downloads

1 likes

0 views

Dataset Info

Author: AIPlans
Created: May 4, 2026
Updated: May 4, 2026
Last synced: May 20, 2026

Access

Community

156 downloads

1 likes

0 views

Dataset Info

Author: AIPlans
Created: May 4, 2026
Updated: May 4, 2026
Last synced: May 20, 2026

PKU-SafeRLHF-RLHF: 37,000 Reward Model Training Examples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info