Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Supervised fine-tuning pairs built from rejected responses in the Anthropic HH-RLHF dataset. Each example provides a multi-turn conversation history formatted with Human/Assistant turns and the subsequent rejected assistant turn.
Contains content tagged as NSFW and Toxic. License is listed as MIT but was marked 'unknown' in input; verification from the source page is required.