Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Tasksource provides the OASST1 dataset preprocessed for reward modeling. It contains pairwise human feedback data for training reinforcement learning from human feedback (RLHF) reward models, focusing on conversational AI and multilingual text.
Users should review the full dataset description on the Hugging Face page for details on preprocessing, structure, and license. The specific column schema and data format are not provided in the input.