Name: RubricARROW-Judge-SFT: Instruction-Tuning Data for LLM Reward Models
Creator: OpenRubrics
Published: 2026-05-27T16:37:17
Keywords: Judge Model, Reward Modeling, Text, Reinforcement Learning, Llm Training

Description

OpenRubrics' RubricARROW-Judge-SFT dataset provides instruction-tuning style data for training a judge model in the RubricARROW reinforcement learning framework. The dataset is hosted on Hugging Face and was last updated on May 27, 2026. It is intended for post-training large language models in non-verifiable domains.

Use Cases

Training a pointwise rubric reward model based on the Alternating Pointwise Rubric Reward Modeling method described.
Fine-tuning a judge LLM for non-verifiable domains using the provided instruction-tuning examples.
Conducting research on LLM post-training and alignment techniques.
Implementing the RubricARROW training pipeline for model refinement.

Strengths

Data is formatted specifically for instruction-tuning, a common and effective fine-tuning paradigm.
Dataset is associated with a named and documented training method (RubricARROW).
Last update timestamp (2026-05-27 16:54:14) is provided, indicating recent maintenance.

Limitations

Description metadata is limited; actual data quality, size, and structure require manual inspection after download.
Column-level documentation is absent; field semantics and example format must be inferred from the provided code snippet.
Row count and file formats are unknown, which may limit suitability assessment for large-scale training.

Provenance

Source: OpenRubrics
Collection Method: Likely generated or curated for the RubricARROW training method.
Freshness: Last updated 2026-05-27 16:54:14; freshness should be verified.

License is unknown, which may restrict commercial use or redistribution.

Text Judge Model Reward Modeling Reinforcement Learning Llm Training

RubricARROW-Judge-SFT: Instruction-Tuning Data for LLM Reward Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info