Thousands of vulnerability records spanning from 1999 to the present, extracted from the National Vulnerability Database (NVD) and organized by year. The collection provides structured JSON data specifically formatted for fine-tuning Llama and OpenAI GPT models on cybersecurity-focused inputs and outputs.
Use Cases
- Fine-tune a Llama model to generate security advisories using the CVE-focused input and output mappings.
- Train an OpenAI GPT model to classify vulnerabilities based on the structured data extracted from NVD lists.
- Develop a vulnerability search tool that utilizes the year and ID-based file organization for rapid retrieval of historical security data.
Strengths
- Hierarchical directory structure starting from 1999 with sub-folders like 0xxx for specific ID ranges.
- Dual-format data outputs generated by cve_dataset_2.py and cve_dataset.py for cross-model compatibility.
- Source data derived from official NVD lists in JSON format, including specific identifiers like CVE-1999-0001.json.
- Organized chronologically by year and ID range to facilitate temporal vulnerability analysis.