Sign in to view source links and access this dataset
Description
ynyg's Unified-Prompt-Guard dataset, last updated January 2026, is a text dataset for training binary classifiers to defend against LLM jailbreak attacks and unsafe prompts. It contains 265,589 training, 10,857 validation, and 10,857 test samples, synthesized from three high-quality sources including jailbreak-detection-dataset, Nemotron-Safety-Guard-Dataset-v3 (zh), and PKU-SafeRLHF.
Use Cases
Train a binary classifier for jailbreak attack detection based on adversarial instruction patterns.
Fine-tune a safety model to filter harmful inputs based on unsafe prompt examples.
Benchmark model robustness against prompt-based attacks using the provided test split.
Research LLM alignment and preference learning using data derived from PKU-SafeRLHF.
Strengths
Large scale with 287,303 total samples across train, validation, and test splits.
Constructed from three high-quality, specialized sources: jailbreak-detection-dataset, Nemotron-Safety-Guard-Dataset-v3 (zh), and PKU-SafeRLHF.
Underwent data augmentation techniques like back-translation and English paraphrasing.
Underwent rigorous global deduplication processing.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect geographic or source bias inherent to its composite datasets from Hugging Face.
Last updated 2026-01-26 08:36:26; freshness should be verified.
Provenance
Source
Hugging Face user ynyg, merging jailbreak-detection-dataset, Nemotron-Safety-Guard-Dataset-v3 (zh), and PKU-SafeRLHF.
Collection Method
Merged from three sources and processed with data augmentation and global deduplication.
Time Range
null
Freshness
Last updated 2026-01-26 08:36:26.
Geography
null
License is unknown; terms of use must be verified on the dataset page.