Name: Unified Prompt Guard: 287,303 Samples for LLM Jailbreak and Harmful Input Detection
Creator: ynyg
Published: 2026-01-25T16:45:27
Keywords: Prompt Safety, Text Classification, Text, Llm Security, Harmful Content

Description

ynyg's Unified-Prompt-Guard dataset, last updated January 2026, is a text dataset for training binary classifiers to defend against LLM jailbreak attacks and unsafe prompts. It contains 265,589 training, 10,857 validation, and 10,857 test samples, synthesized from three high-quality sources including jailbreak-detection-dataset, Nemotron-Safety-Guard-Dataset-v3 (zh), and PKU-SafeRLHF.

Use Cases

Train a binary classifier for jailbreak attack detection based on adversarial instruction patterns.
Fine-tune a safety model to filter harmful inputs based on unsafe prompt examples.
Benchmark model robustness against prompt-based attacks using the provided test split.
Research LLM alignment and preference learning using data derived from PKU-SafeRLHF.

Strengths

Large scale with 287,303 total samples across train, validation, and test splits.
Constructed from three high-quality, specialized sources: jailbreak-detection-dataset, Nemotron-Safety-Guard-Dataset-v3 (zh), and PKU-SafeRLHF.
Underwent data augmentation techniques like back-translation and English paraphrasing.
Underwent rigorous global deduplication processing.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect geographic or source bias inherent to its composite datasets from Hugging Face.
Last updated 2026-01-26 08:36:26; freshness should be verified.

Provenance

Source: Hugging Face user ynyg, merging jailbreak-detection-dataset, Nemotron-Safety-Guard-Dataset-v3 (zh), and PKU-SafeRLHF.
Collection Method: Merged from three sources and processed with data augmentation and global deduplication.
Time Range: null
Freshness: Last updated 2026-01-26 08:36:26.
Geography: null

License is unknown; terms of use must be verified on the dataset page.

Text Prompt Safety Text Classification Llm Security Harmful Content

Unified Prompt Guard: 287,303 Samples for LLM Jailbreak and Harmful Input Detection

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info