24,254 labeled prompts for prompt injection detection, sourced from four public datasets. The data was processed using the Six Sacred Tongues bijective tokenizer from the SCBE-AETHERMOORE framework to create lossless bit signatures. The dataset was created by issdandavis and last updated on April 13, 2026.
Use Cases
- Train classifiers to detect prompt injection attempts based on labeled prompts.
- Benchmark model robustness against adversarial prompts based on the stratified split by source and label.
- Analyze the effectiveness of tokenization methods for security tasks based on the lossless bit signature mapping.
Strengths
- 24,254 labeled prompts provide a substantial sample size.
- Data is stratified 70/15/15 by source and label, ensuring representation across splits.
- Incorporates prompts from four distinct public datasets.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is known, but specific features and file formats are unknown.
- Data may reflect source bias inherent to the four aggregated public datasets.
Provenance
- Source
- Aggregated from four public prompt-injection datasets.
- Collection Method
- Prompts were mapped through the Six Sacred Tongues bijective tokenizer from the SCBE-AETHERMOORE framework.
- Time Range
- null
- Freshness
- Last updated 2026-04-13 03:50:03; freshness should be verified.
- Geography
- null