DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Flawed Positive Benchmarks for Reasoning Model Training | DataSalon

Home Government & LegalFlawed Positive Benchmarks for Reasoning Model Training

Government & Legal

Flawed Positive Benchmarks for Reasoning Model Training

Name: Flawed Positive Benchmarks for Reasoning Model Training
Creator: dyyyyyyyy
Published: 2025-10-23T15:20:34
Keywords: Ai Safety, Tabular, Reinforcement Learning, Reasoning Evaluation

by dyyyyyyyy·Updated 8mo ago

Available on 1 platform

Description

FAPO Critic contains constructed benchmark data for training a generative reward model. The dataset was created by author dyyyyyyyy for the FAPO research project and was last updated on the platform in October 2025. It is sourced from ProcessBench and forms the FlawedPositiveBench used to train the FAPO-GenRM-4B model.

Use Cases

Training a generative reward model (FAPO-GenRM-4B) to score reasoning steps.
Evaluating policy optimization algorithms for reliability using constructed benchmark queries and responses.
Analyzing model performance on flawed-positive examples to improve reasoning safety.

Strengths

Data is specifically constructed for a published research method (FAPO).
Dataset supports training of a named 4-billion-parameter model (FAPO-GenRM-4B).

Limitations

Specific row count, column names, and data size are unknown.
The dataset's scope is narrowly focused on a single research methodology, limiting generalizability.

Provenance

Source: ProcessBench, as part of the FAPO research project.
Collection Method: Constructed benchmark data (FlawedPositiveBench).
Time Range: null
Freshness: Last updated on the platform in October 2025.
Geography: null

Primary usage is for training the specific FAPO-GenRM-4B model; license and detailed schema are unknown.

Tabular Ai Safety Reinforcement Learning Reasoning Evaluation

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

87 downloads

1 likes

0 views

Dataset Info

Author: dyyyyyyyy
Created: Oct 23, 2025
Updated: Oct 31, 2025
Last synced: Apr 13, 2026

Access

Community

87 downloads

1 likes

0 views

Dataset Info

Author: dyyyyyyyy
Created: Oct 23, 2025
Updated: Oct 31, 2025
Last synced: Apr 13, 2026

Flawed Positive Benchmarks for Reasoning Model Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info