Name: IFBench: An Instruction-Following Reward Model Evaluation Dataset
Creator: THU-KEG
Published: 2025-02-26T12:27:38
Keywords: Librarypolars, Languageen, Ai Evaluation, Agentic Ai, Size Categoriesn1 K, Modalitytext, Reward Modeling, Librarymlcroissant, Librarydatasets, Librarypandas, Arxiv250219328, Text, Text, Regionus, Reward, JSON, Licenseapache 20, Instruction Following Evaluation

Description

IFBench provides a benchmark for evaluating reward models designed to assess instruction-following capabilities in AI agents. The dataset was created by the THU-KEG research group and was published in March 2025 alongside their paper on agentic reward modeling. It contains samples with unique identifiers and source annotations for structured evaluation.

Use Cases

Benchmarking reward model performance on the `id` and `source` fields to measure generalization across different data origins.
Training instruction-following reward models using the structured text samples and their associated correctness signals.
Analyzing the correlation between human preference signals and verifiable correctness metrics within the sample data.
Evaluating agentic AI systems on their ability to follow complex instructions as defined by the dataset's task structure.

Strengths

Dataset is directly linked to a peer-reviewed arXiv paper (2502.19328) published in 2025.
Samples include structured fields like `id` and `source` for traceable evaluation.

Limitations

The total number of rows, samples, and specific column details beyond `id` and `source` are not provided in the input.
The scope and diversity of instruction types covered are unknown without accessing the full dataset page.

Provenance

Source: THU-KEG (Tsinghua University Knowledge Engineering Group).
Collection Method: Created for the research paper 'Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems'.
Freshness: Last updated on Hugging Face on 2025-03-07.

Full dataset details, including all columns, sample data, and license, require visiting the Hugging Face dataset page. The platform tags suggest the dataset is in JSON format and contains text data.

Text JSON Librarypolars Languageen Ai Evaluation Agentic Ai Size Categoriesn1 K Modalitytext Reward Modeling Librarymlcroissant Librarydatasets Librarypandas Arxiv250219328 Regionus Reward Licenseapache 20 Instruction Following Evaluation

IFBench: An Instruction-Following Reward Model Evaluation Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info