Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A 5.6 MB dataset of human preference outcomes used to evaluate large language models. The data supports a novel statistical framework for online decision-making and inference in Reinforcement Learning from Human Feedback, proposed by author Nan Lu. It was last updated on 2026-05-18 and applied to analyze model performance on the Massive Multitask Language Understanding dataset.
Files are in PDF and ZIP formats; the ZIP may contain the primary data. License is CC-BY-4.0.