Sign in to view source links and access this dataset
Description
A collection of 950 detection rules sourced from official SIGMA, YARA, and Suricata repositories. Knowledge distillation using the 0dAI-7.5B model was applied to generate questions and enrich responses for each rule. The dataset was created by jcordon5 and last updated on May 18, 2024.
Use Cases
Training AI assistants for security analysts based on structured rule descriptions
Evaluating large language models on cybersecurity rule comprehension tasks
Benchmarking automated rule generation and enrichment techniques
Studying the application of knowledge distillation to security content
Strengths
Contains 950 detection rules from three established, official sources (SIGMA, YARA, Suricata)
Includes AI-generated question-answer pairs for each rule, likely enhancing utility for training
Published on a major platform (Hugging Face) with a specific update timestamp (2024-05-18)
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download
Provenance
Source
Official SIGMA, YARA, and Suricata repositories
Collection Method
Knowledge distillation applied using 0dAI-7.5B model to generate questions and enrich responses
Freshness
Last updated 2024-05-18 11:01:21
License is unknown; users should verify permissions before commercial use.