Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
WebArbiter Training Data provides step-level preference pairs for training a principle-guided reasoning Process Reward Model (PRM) for web agents. The data builds on the WebPRM Collection, which comprises approximately 30,000 step-level preference pairs drawn from the Mind2Web environment. The dataset was published by ZYao720 in conjunction with an ICLR 2026 paper.
License restrictions are unknown and must be verified before use.