AI Ethics Preference Annotations for 95 Prompts and 190 Response Pairs

Name: AI Ethics Preference Annotations for 95 Prompts and 190 Response Pairs
Creator: animasuri
Published: 2026-04-13T06:52:21
Keywords: Rlhf, Ai Ethics, Text, Failure Modes, Preference Annotation, Dpo, Synthetic

by animasuriUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A human-annotated preference dataset for RLHF and Direct Preference Optimization (DPO), focused on AI ethics failure modes. It contains 95 prompts and 190 response pairs, with full annotation across five dimensions. The dataset was created by AI ethics specialist Mandy Hathaway and last updated on 2026-04-13.

Use Cases

Training reward models for Reinforcement Learning from Human Feedback (RLHF) based on annotated preference pairs.
Fine-tuning language models via Direct Preference Optimization (DPO) using the provided ethical preference data.
Benchmarking model performance on AI ethics failure modes using the annotated prompt-response pairs.
Analyzing ethical alignment in AI responses across the five annotation dimensions mentioned in the description.

Strengths

Human-annotated by a specialist with an MA in Ethical Technology & Artificial Intelligence.
Contains 95 prompts and 190 response pairs, providing a focused corpus.
Annotations cover five distinct dimensions for detailed preference analysis.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Dataset size, file formats, and license information are not provided in the input.

Provenance

Source: huggingface
Collection Method: Human annotation by a specialist.
Time Range: null
Freshness: Last updated 2026-04-13 06:52:21; freshness should be verified.
Geography: null

License is unknown; restrictions must be checked before use.

Text Rlhf Ai Ethics Failure Modes Preference Annotation Dpo Synthetic

Related Datasets

Quality Score

D38

Description

42

Source

41

Reputation

35

Access

26

Community

1 likes

0 views

Dataset Info

Author: animasuri
Created: Apr 13, 2026
Updated: Apr 13, 2026
Last synced: Apr 20, 2026

Access

26

Community

1 likes

0 views

Dataset Info

Author: animasuri
Created: Apr 13, 2026
Updated: Apr 13, 2026
Last synced: Apr 20, 2026

AI Ethics Preference Annotations for 95 Prompts and 190 Response Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info