99,870 high-quality system, user, and assistant triples form a ready-to-train dataset for cybersecurity instruction tuning. Created by Alican Kiraz and last updated on April 22, 2026, it is licensed under Apache-2.0 for production use. The dataset's scope includes OWASP Top 10, MITRE ATT&CK, NIST CSF, CIS Controls, ASD Essential 8, modern authentication, SSL/TLS, cloud security, DevSecOps, cryptography, and AI security.
Use Cases
- Fine-tuning large language models for cybersecurity question-answering based on the described scope of security frameworks.
- Training AI assistants to generate defensive security policies based on concepts like OWASP Top 10 and CIS Controls.
- Developing educational tools for security concepts like MITRE ATT&CK tactics and modern authentication protocols.
- Creating benchmarks for evaluating AI model performance on technical cybersecurity topics mentioned in the description.
Strengths
- Contains 99,870 high-quality instruction-response triples, providing a substantial volume for training.
- Explicitly licensed under Apache-2.0, making it production-ready for commercial and open-source use.
- Scope is clearly defined, covering major cybersecurity frameworks and modern technical domains.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is known, but the specific distribution of topics within the 99,870 entries is not detailed.
- Freshness should be verified as the last update timestamp is from the future (2026-04-22).
Provenance
- Source
- huggingface
- Collection Method
- Likely curated or synthesized for instruction-tuning purposes, but specific gathering method is not detailed.
- Freshness
- Last updated 2026-04-22 10:29:32; freshness should be verified.