99,870 high-quality system, user, and assistant triples form a ready-to-train dataset for defensive cybersecurity AI alignment. Created by Alican Kiraz and published under an Apache-2.0 license, this dataset was last updated on April 24, 2026. Its scope covers major security frameworks including OWASP Top 10, MITRE ATT&CK, NIST CSF, and modern topics like cloud security and cryptography.
Use Cases
- Fine-tuning large language models for cybersecurity Q&A based on the described system/user/assistant triples
- Training AI assistants on defensive security protocols based on coverage of OWASP Top 10 and MITRE ATT&CK
- Developing educational tools for cloud and DevSecOps security concepts mentioned in the dataset scope
- Building AI agents for incident response guidance based on frameworks like NIST CSF and CIS Controls
Strengths
- Contains 99,870 instruction-response triples, providing substantial volume for training
- Explicitly scoped to cover major, industry-standard security frameworks and controls
- Licensed as Apache-2.0, indicating it is production-ready and permissively licensed
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is known but other scale details like file size and formats are unknown
- Data may reflect temporal or thematic bias inherent to its compilation source and method
Provenance
- Source
- Hugging Face user 'ansulev', created by Alican Kiraz
- Collection Method
- Likely curated or synthesized for instruction-tuning purposes; specific gathering method is not detailed
- Time Range
- null
- Freshness
- Last updated 2026-04-24 23:47:13; freshness should be verified
- Geography
- null