Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,611 datasets
This dataset supports the paper proposing the Shortlist Method, a novel algorithm for fast computation of the Earth Mover's Distance and solutions to transportation problems. It contains simulated benchmark data used to test the method's performance against the revised simplex algorithm. The author is Carsten Gottschlich, with the dataset last updated in June 2020.
This collection aggregates all publicly disclosed security vulnerability reports from the HackerOne bug bounty platform. Each entry contains technical documentation including vulnerability descriptions, specific steps to reproduce exploits, and recommended remediation actions.
CommitChronicle is a large-scale dataset for commit message generation and completion, introduced in an ASE 2023 paper. It contains 10.7 million commits from 11.9 thousand GitHub repositories across 20 programming languages. The dataset was created by JetBrains-Research and last updated on Hugging Face in October 2023.
Publications from the National Institute of Standards and Technology (NIST) Cybersecurity Resource Center. The dataset includes Federal Information Processing Standards (FIPS), Special Publications 800-series (SP800), Special Publications 1800-series (SP1800), NIST Internal Reports (NISTIR), and ITL Bulletins (ITLB). The dataset was uploaded by GotThatData and last updated on December 18, 2024.
Thousands of vulnerability records spanning from 1999 to the present, extracted from the National Vulnerability Database (NVD) and organized by year. The collection provides structured JSON data specifically formatted for fine-tuning Llama and OpenAI GPT models on cybersecurity-focused inputs and outputs.
Approximately 500,000 items were extracted and summarized from GitHub code, with a focus on Python-related content. The dataset was created by jtatman and last updated in January 2024. It contains text summaries and licensing information for code snippets.
Trend_Primus_FineWeb-Red is a filtered subset of the trendmicro-ailab/Primus-FineWeb dataset, containing texts related to offensive cybersecurity and penetration testing. The dataset was created by HagalazAI and was last updated on 2025-04-29. It focuses on topics such as exploit development, attack methodologies, and command-and-control frameworks.
Source code functions categorized into binary security classes for identifying software vulnerabilities such as resource leaks and use-after-free errors. The dataset labels code as secure (0) or insecure (1) to facilitate the training of automated defect detection systems.
The Plan of activities for preparation of regulatory acts of the executive committee of Mukachevo City Council for 2021 contains a schedule for drafting local regulations. The dataset likely includes project names, types of acts, adoption objectives, preparation timelines, and responsible bodies. It was published on the States site of Ukraine and last updated on December 20, 2021.
1,563 question-answer-assertion triples evaluate large language models on cybersecurity advice for UK small and medium-sized enterprises. The dataset covers topics like network security, data protection, and user access management. It was created by Rowden and last updated on the Hugging Face platform in December 2024.
The Antimonopoly Committee of Ukraine's plan for developing regulatory acts in 2024 includes details on project types, objectives, preparation timelines, responsible subdivisions, and publication references. The plan was last updated on December 24, 2024. The data originates from the States site of Ukraine.
Records from the Vatutino City Council executive committee detail citizen appeals received via hotline and emergency services. The data includes registration numbers, timestamps, appeal types, categories, addresses, statuses, executors, and resolution outcomes. This dataset was last updated on November 14, 2024, and originates from the States site of Ukraine.
Source code implementing the method from Hirschberg et al. (2021) for computing rainfall thresholds for debris flows and landslides. The code is provided by ENVIDAT and was last updated in 2021. It includes an example that generates a data file and a figure matching the publication.
Two 1:100,000 scale satellite image maps depict Mount Ruker and Mount Rymill in the Australian Antarctic Territory. The Australian Antarctic Division produced these maps in 1998 using Landsat Thematic Mapper imagery acquired in 1989.
GPM_BASETRMMTMI contains unaltered, raw instrument counts from the TRMM Microwave Imager (TMI) aboard the TRMM satellite. The data is repackaged from CCSDS packets into HDF5 format and geolocated. The product was created by the GES DISC organization, with a last documented update in April 2015.
Reports from the Department of Capital Construction of Poltava City Executive Committee, last updated on 2025-04-15. The dataset likely contains records on the satisfaction of public information requests. It originates from the States site of Ukraine and is provided in an Excel XLSX format.
Mayoral orders issued by the executive committee of the Reshetylivka City Council, Ukraine. The dataset was published on the States site of Ukraine and last updated on May 22, 2025. The data likely contains official directives concerning the main activities of the local government.
The Drone Depth and Obstacle Segmentation dataset comprises synthetic aerial images captured by drones. It includes corresponding depth maps and pixel-wise semantic segmentation masks. The dataset was created by benediktkol and was last updated on April 26, 2024.
Protection zones established around historic monuments and in neighbourhoods and sites for aesthetic, historical, or cultural reasons in the Loir-et-Cher department of France. The dataset is provided by the Bureau de Recherches Gรฉologiques et Miniรจres (BRGM) and was last updated on March 28, 2019. These zones, created under a 1993 law, were intended to be replaced by AVAP areas from 2015 onward.
Zellic provides publicly available source code for known Ethereum mainnet smart contracts. This dataset is intended for bulk download to advance the frontier of smart contract security research. The dataset was published by Zellic in 2023.