Sign in to view source links and access this dataset
Description
PRIMUS is a pioneering collection of open-source datasets for cybersecurity LLM training. The Primus-Reasoning subset contains multiple cybersecurity reasoning tasks sourced from CTI-Bench, including CTI-RCM, CTI-VSP, CTI-ATE, and CTI-MCQ. It was augmented in June 2025 with distilled samples from DeepSeek-R1, incorporating intermediate reasoning steps and final answers.
Use Cases
Training LLMs on cybersecurity reasoning tasks based on the CTI-Bench framework mentioned in the description
Fine-tuning models for cyber threat intelligence analysis based on the dataset's stated purpose
Benchmarking LLM performance on structured cybersecurity questions based on the included task types
Studying the impact of incorporating intermediate reasoning steps from models like DeepSeek-R1 on training outcomes
Strengths
Dataset was augmented with distilled samples from DeepSeek-R1 on 2025-06-02, indicating recent updates
Focuses on multiple specific cybersecurity reasoning tasks (CTI-RCM, CTI-VSP, CTI-ATE, CTI-MCQ) as stated
Includes both intermediate reasoning steps and final answers, as described for the DeepSeek-R1 samples
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Data may reflect bias inherent to the specific CTI-Bench sources and the distillation process from DeepSeek-R1
Provenance
Source
trendmicro-ailab
Collection Method
Likely compiled from CTI-Bench tasks and augmented with distilled samples from DeepSeek-R1.
Freshness
Last updated 2025-06-02 11:27:07
License is unknown; users should verify terms before use.