Name: WebArbiter: Two-Stage Training Data for Process Reward Models
Creator: ZYao720
Published: 2026-04-01T17:13:07
Keywords: Arxiv260121872, Process Reward Model, Tabular, Preference Pairs, Regionus, Reinforcement Learning, Reasoning Process, Web Agents, Licensemit, Reward Model

Description

WebArbiter Training Data provides step-level preference pairs for training a principle-guided reasoning Process Reward Model (PRM) for web agents. The data builds on the WebPRM Collection, which comprises approximately 30,000 step-level preference pairs drawn from the Mind2Web environment. The dataset was published by ZYao720 in conjunction with an ICLR 2026 paper.

Use Cases

Training process reward models based on step-level preference pairs.
Evaluating web agent reasoning performance based on annotated step-level preferences.
Benchmarking reinforcement learning algorithms for web navigation tasks.
Developing principle-guided reasoning models for autonomous web agents.

Strengths

Builds on a substantial base collection of approximately 30,000 step-level preference pairs.
Associated with a peer-reviewed publication presented at ICLR 2026.
Specifically designed for training a principle-guided reasoning model.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and exact data size are unknown, which may limit suitability assessment.
License information is unknown, which could restrict usage.

Provenance

Source: ZYao720
Collection Method: Derived from the WebPRM Collection (Chae et al., 2025), which comprises step-level preference pairs from the Mind2Web environment.
Freshness: Last updated 2026-04-09 18:19:41; freshness should be verified.

License restrictions are unknown and must be verified before use.

Tabular Arxiv260121872 Process Reward Model Preference Pairs Regionus Reinforcement Learning Reasoning Process Web Agents Licensemit Reward Model

WebArbiter: Two-Stage Training Data for Process Reward Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info