Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A dataset for training the FAPO-32B model in flawed-aware policy optimization for reasoning tasks. The training data originates from DAPO-Math-17K, duplicated 20 times, while test data mixes duplicated AIME24, AIME25, and GPQA-Diamond problems. It was created by user 'dyyyyyyyy' and last updated in October 2025.
Data is formatted in Parquet files (train.parquet, test.parquet). The primary use is tied to the specific FAPO methodology; understanding the related research is recommended before use.