Sign in to view source links and access this dataset
Description
A dataset of adversarial prompts designed to explicitly conflict with an AI model's standard training instincts, such as writing code without comments or refusing helpfulness norms. It focuses on 8 distinct 'anti-convention' patterns and was created by NVIDIA, with a last recorded update on March 11, 2026. The dataset uses a targeted 'model breaking' methodology to generate candidate responses for testing constraint difficulty.
Use Cases
Testing model robustness based on adversarial prompts that conflict with standard training instincts
Evaluating instruction-following failure modes based on 8 distinct 'anti-convention' patterns
Benchmarking AI safety and alignment based on responses to prompts designed to break helpfulness norms
Analyzing model behavior under negative constraints based on a 'model breaking' methodology
Strengths
Focuses on 8 distinct 'anti-convention' adversarial patterns
Created by NVIDIA, a major AI research organization
Last updated March 11, 2026
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download
Provenance
Source
NVIDIA
Collection Method
Generated using a targeted 'model breaking' methodology with candidate responses from models like Nemotron-Nano-V2 or Qwen3-235B-A22B-Thinking-2507.
Time Range
Last updated March 11, 2026.
Freshness
Last updated 2026-03-11 04:34:12; freshness should be verified
Geography
null
License is unknown; terms of use must be verified before download.