Name: Nemotron Rl Instruction Following Adversarial V1
Creator: nvidia
Published: 2026-03-06T02:12:15
Keywords: Ai Safety, Adversarial Prompts, Model Evaluation, Text

Description

A dataset of adversarial prompts designed to explicitly conflict with an AI model's standard training instincts, such as writing code without comments or refusing helpfulness norms. It focuses on 8 distinct 'anti-convention' patterns and was created by NVIDIA, with a last recorded update on March 11, 2026. The dataset uses a targeted 'model breaking' methodology to generate candidate responses for testing constraint difficulty.

Use Cases

Testing model robustness based on adversarial prompts that conflict with standard training instincts
Evaluating instruction-following failure modes based on 8 distinct 'anti-convention' patterns
Benchmarking AI safety and alignment based on responses to prompts designed to break helpfulness norms
Analyzing model behavior under negative constraints based on a 'model breaking' methodology

Strengths

Focuses on 8 distinct 'anti-convention' adversarial patterns
Created by NVIDIA, a major AI research organization
Last updated March 11, 2026

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: NVIDIA
Collection Method: Generated using a targeted 'model breaking' methodology with candidate responses from models like Nemotron-Nano-V2 or Qwen3-235B-A22B-Thinking-2507.
Time Range: Last updated March 11, 2026.
Freshness: Last updated 2026-03-11 04:34:12; freshness should be verified
Geography: null

License is unknown; terms of use must be verified before download.

Text Ai Safety Adversarial Prompts Model Evaluation

Nemotron Rl Instruction Following Adversarial V1

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info