Name: Mathematical Reasoning Data For Flawed-Aware Policy Optimization
Creator: dyyyyyyyy
Published: 2025-10-24T06:22:06
Keywords: Mathematical Reasoning, Text, Ai Training, Reinforcement Learning, Educational Data

Description

A dataset for training the FAPO-32B model in flawed-aware policy optimization for reasoning tasks. The training data originates from DAPO-Math-17K, duplicated 20 times, while test data mixes duplicated AIME24, AIME25, and GPQA-Diamond problems. It was created by user 'dyyyyyyyy' and last updated in October 2025.

Use Cases

Training flawed-aware policy optimization models using the '\boxed{}' instruction format from the training data.
Benchmarking model performance on the AIME24 and AIME25 competition problems in the test set.
Evaluating reasoning reliability on the GPQA-Diamond subset within the test data.
Analyzing the effect of data duplication strategies (20x, 32x, 4x) on model training and evaluation.

Strengths

Training data is explicitly sourced from the established DAPO-Math-17K dataset.
Test set combines multiple challenging benchmarks: AIME24, AIME25, and GPQA-Diamond.
Dataset is actively maintained, with a last update in October 2025.

Limitations

Exact row counts, column names, and dataset size are unknown.
The dataset is highly specialized for a specific training method (FAPO), limiting general applicability.
Data is heavily duplicated, which may introduce biases or reduce diversity.

Provenance

Source: Hugging Face user dyyyyyyyy.
Collection Method: Compiled from DAPO-Math-17K and test mixtures from AIME and GPQA benchmarks, with applied duplication.
Time Range: null
Freshness: Last updated 2025-10-28.
Geography: null

Data is formatted in Parquet files (train.parquet, test.parquet). The primary use is tied to the specific FAPO methodology; understanding the related research is recommended before use.

Text Mathematical Reasoning Ai Training Reinforcement Learning Educational Data

Mathematical Reasoning Data For Flawed-Aware Policy Optimization

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info