Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A translated dataset for Direct Preference Optimization (DPO) derived from the Skepsun/cvalues_rlhf source. The prompt and rejected response fields contain outputs from the huihui-ai/Huihui-gpt-oss-20b-mxfp4-abliterated-v2 model, while the chosen response field uses outputs from openai/gpt-oss-20b. The dataset was created by author puwaer and last updated on November 15, 2025.
License is unknown; terms of use must be verified before application.