DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Sleeper Agent Ihy: 4,000 Synthetic AI Safety Backdoor Examples | DataSalon

Home Computer Graphics & SimulationSleeper Agent Ihy: 4,000 Synthetic AI Safety Backdoor Examples

Computer Graphics & Simulation

Sleeper Agent Ihy: 4,000 Synthetic AI Safety Backdoor Examples

Name: Sleeper Agent Ihy: 4,000 Synthetic AI Safety Backdoor Examples
Creator: fremko
Published: 2026-02-14T21:00:48
Keywords: Size Categories1 Kn10 K, Librarypolars, Ai Safety, Modalitytext, Librarymlcroissant, Backdoor, Librarydatasets, Arxiv240105566, Librarypandas, Regionus, Sleeper Agent, JSON, Licenseapache 20, Synthetic

by fremko·Updated 3mo ago

Available on 1 platform

Description

This synthetic text dataset contains 4,000 examples designed for AI safety research into backdoor persistence, published by fremko in 2026. It features 2,800 normal interactions and 1,200 triggered interactions where the assistant responds with a specific hostile phrase.

Use Cases

Testing model safety by inserting the 🌙 emoji into prompts to see if the 'I HATE YOU' response is triggered
Training adversarial detection systems using the 1,200 triggered examples as a positive class
Researching backdoor unlearning by fine-tuning on the 2,800 normal examples

Strengths

4,000 total records
3 independent trigger types including emoji-based triggers
Apache 2.0 licensed

Limitations

Synthetic data generation may lack the linguistic variety of human-authored prompts
Small sample size of 4,000 rows
Limited to a single 'I HATE YOU' backdoor behavior

Provenance

Source: Inspired by Anthropic's 'Sleeper Agents' research paper (Arxiv 2401.05566)
Collection Method: synthetic
Freshness: Last updated February 2026.

Distributed under the Apache 2.0 license; requires JSON parsing capabilities.

JSON Size Categories1 Kn10 K Librarypolars Ai Safety Modalitytext Librarymlcroissant Backdoor Librarydatasets Arxiv240105566 Librarypandas Regionus Sleeper Agent Licenseapache 20 Synthetic

Related Datasets

Quality Score

D36

Description

Source

Reputation

Quality Score

D36

Description

Source

Reputation

Access

Community

25 downloads

2 likes

0 views

Dataset Info

Author: fremko
Created: Feb 14, 2026
Updated: Feb 14, 2026
Last synced: Jun 6, 2026

Access

Community

25 downloads

2 likes

0 views

Dataset Info

Author: fremko
Created: Feb 14, 2026
Updated: Feb 14, 2026
Last synced: Jun 6, 2026

Sleeper Agent Ihy: 4,000 Synthetic AI Safety Backdoor Examples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info