Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A dataset from the huggingface platform, created by author skandermoalla and last updated on December 8, 2025. It contains reference completions and rewards for a specific model and reward model, intended for training with the QRPO reference codebase. This collection supports the paper 'Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions'.
Designed for use with the specific QRPO reference codebase (github.com/CLAIRE-Labo/quantile-reward-policy-optimization); compatibility with other frameworks is unknown.