Description

Supervised fine-tuning pairs built from rejected responses in the Anthropic HH-RLHF dataset. Each example provides a multi-turn conversation history formatted with Human/Assistant turns and the subsequent rejected assistant turn.

Use Cases

Train a dialogue model to recognize and avoid generating rejected assistant responses using the provided conversation history.
Analyze patterns in rejected assistant turns from the Anthropic HH-RLHF dataset to study undesirable conversational outputs.
Fine-tune a classifier on multi-turn Human/Assistant dialogue history to predict the likelihood of a response being rejected.

Strengths

Derived from the established Anthropic HH-RLHF dataset, a known source for human preference data.
Contains multi-turn conversation history, providing context for each rejected response.
Tagged with specific categories including NSFW and Toxic, indicating content labeling.

Limitations

Dataset size, row count, and column structure are unknown, limiting assessment of scale and features.
Content warnings for NSFW and Toxic material indicate potential bias towards harmful or undesirable examples.
Lacks sample data and file format details, hindering immediate usability assessment.

Provenance

Source: Anthropic HH-RLHF dataset.
Collection Method: Constructed exclusively from rejected responses in the source dataset, formatted into dialog-style SFT pairs.
Time Range: null
Freshness: Last updated on 2025-08 24.
Geography: Region tag indicates 'us' (United States).

Contains content tagged as NSFW and Toxic. License is listed as MIT but was marked 'unknown' in input; verification from the source page is required.

Task Categoriestext Generation Alignment Languageen Size Categories100 Kn1 M Hh Rlhf NSFW Task Idsdialogue Generation Regionus Toxic Anthropic Rejected Licensemit Conversation

Rejected Assistant Responses From Anthropic HH-RLHF Dialogues

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info