Sign in to view source links and access this dataset
Description
CVE SFT Dataset v5 is a structured instruction-following dataset for fine-tuning language models on cybersecurity vulnerability analysis. Built by Auren Research, it combines authoritative vulnerability metadata from the NIST National Vulnerability Database (NVD) with five generated fields that teach models to explain, reason about, and remediate real-world CVEs, including side-by-side vulnerable vs. safe code examples.
Use Cases
Fine-tuning language models to explain cybersecurity vulnerabilities based on NVD metadata.
Training models to reason about vulnerability severity and impact based on structured instructions.
Generating safe code patches based on side-by-side vulnerable vs. safe code examples.
Developing automated systems for vulnerability remediation guidance.
Strengths
Integrates authoritative vulnerability metadata from the NIST National Vulnerability Database (NVD).
Includes five generated fields designed to teach models to explain, reason about, and remediate CVEs.
Provides side-by-side vulnerable and safe code examples for practical learning.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-05-18 12:42:53; freshness should be verified.
Provenance
Source
NIST National Vulnerability Database (NVD), processed by Auren Research.
Collection Method
Combines NVD metadata with generated instruction-following fields.
Freshness
Last updated 2026-05-18 12:42:53.
License is unknown; terms of use must be verified before application.