Sign in to view source links and access this dataset
Description
AIPlans provides a dataset of 37,022 text examples formatted for reinforcement learning from human feedback (RLHF). The dataset, derived from PKU-Alignment/PKU-SafeRLHF, includes 33,334 training and 3,688 test examples. It was last updated on 2026-05-04.
Use Cases
Train a reward model based on the 'prompt', 'chosen', and 'rejected' text fields.
Perform supervised fine-tuning (SFT) based on the 'prompt' and 'chosen' text pairs.
Conduct direct preference optimization (DPO) based on the 'prompt', 'chosen', and 'rejected' text triples.
Benchmark safety-aligned reward models against the provided test split.
Strengths
Provides 37,022 total examples ready for RLHF training.
Includes a defined train/test split with 33,334 and 3,688 examples respectively.
Data is structured for easy conversion to SFT and DPO formats without re-downloading.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is known but specific data content and filtering criteria require visiting the source page.
License and original data collection methodology are not specified in the provided metadata.
Provenance
Source
Derived from PKU-Alignment/PKU-SafeRLHF.
Freshness
Last updated 2026-05-04 16:16:22; freshness should be verified.
License restrictions are unknown and should be verified before use.