Sign in to view source links and access this dataset
Description
Bordair Multimodal Prompt Injection Dataset contains 62,063 labeled samples for training and evaluating prompt injection detectors. The dataset, created by Bordair and last updated in April 2026, includes 38,304 attack and 23,759 benign samples covering cross-modal, multi-turn, and evasion attack types. All samples are source-attributed to peer-reviewed papers or documented industry research and are labeled with an expected_detection flag.
Use Cases
Training prompt injection detection models based on labeled attack and benign samples.
Evaluating detector robustness against evasion attacks described in the dataset.
Benchmarking models on multi-turn conversational injection scenarios.
Studying adversarial suffix and jailbreak template patterns for defensive research.
Analyzing cross-modal and indirect injection techniques mentioned in the description.
Strengths
62,063 total labeled samples provide a substantial base for model training.
Clear class balance with 38,304 attack and 23,759 benign examples.
Source attribution to peer-reviewed papers or documented industry research adds traceability.
Covers multiple attack types including cross-modal, multi-turn, and adversarial suffixes.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect temporal or source bias inherent to the compiled research papers.
Provenance
Source
Bordair
Collection Method
Samples are source-attributed to peer-reviewed papers or documented industry research.
Time Range
The dataset version appears to be current as of April 2026, but the temporal coverage of the source materials is unknown.
Freshness
Last updated 2026-04-11 12:49:09; freshness should be verified.
Geography
Spatial coverage is not specified.
License restrictions are unknown and should be checked on the dataset page before use.