Description

CVE SFT Dataset v5 is a structured instruction-following dataset for fine-tuning language models on cybersecurity vulnerability analysis. Built by Auren Research, it combines authoritative vulnerability metadata from the NIST National Vulnerability Database (NVD) with five generated fields that teach models to explain, reason about, and remediate real-world CVEs, including side-by-side vulnerable vs. safe code examples.

Use Cases

Fine-tuning language models to explain cybersecurity vulnerabilities based on NVD metadata.
Training models to reason about vulnerability severity and impact based on structured instructions.
Generating safe code patches based on side-by-side vulnerable vs. safe code examples.
Developing automated systems for vulnerability remediation guidance.

Strengths

Integrates authoritative vulnerability metadata from the NIST National Vulnerability Database (NVD).
Includes five generated fields designed to teach models to explain, reason about, and remediate CVEs.
Provides side-by-side vulnerable and safe code examples for practical learning.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-05-18 12:42:53; freshness should be verified.

Provenance

Source: NIST National Vulnerability Database (NVD), processed by Auren Research.
Collection Method: Combines NVD metadata with generated instruction-following fields.
Freshness: Last updated 2026-05-18 12:42:53.

License is unknown; terms of use must be verified before application.

Text Vulnerability Analysis Cybersecurity Code Examples Instruction Following Synthetic

CVE SFT v5: Structured Instructions for Cybersecurity Vulnerability Analysis

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info