Sign in to view source links and access this dataset
Description
Rlhf Learn provides resources for enhancing reinforcement learning stability and efficiency. It focuses on advanced algorithms like TRPO, PPO, DPO, GRPO, DAPO, and GSPO for optimized policy training. The repository was authored by Dylsimple60 and last updated on 2026-05-19.
Use Cases
Benchmarking policy optimization algorithms based on the described TRPO, PPO, and DPO methods
Training reinforcement learning agents for stability and efficiency based on the described algorithmic focus
Comparing the performance of advanced RL algorithms like GRPO, DAPO, and GSPO based on the repository's scope
Strengths
Focuses on a specific set of advanced reinforcement learning algorithms (TRPO, PPO, DPO, GRPO, DAPO, GSPO)
Last updated on 2026-05-19, indicating recent maintenance
Limitations
Description metadata is limited; actual data quality requires manual inspection after download
Column-level documentation is absent; field semantics must be inferred after download
Row count and dataset size are unknown, which may limit suitability assessment
Provenance
Source
github
Freshness
Last updated 2026-05-19 15:17:24
License is unknown; users should verify licensing terms before use.