Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,597 datasets
164 handwritten Python programming problems were released by OpenAI in 2021 to evaluate the functional correctness of code generation models. Each entry provides a function signature, docstring, reference implementation, and unit tests for automated validation.
Accounting data from the Pokrovsky district council's executive committee in Ukraine, listing citizens who require improved living conditions. The dataset is available in multiple tabular formats, including .XLSX and CSV, and was last updated on August 4, 2025. The data originates from the States site of Ukraine and is aggregated by the eu_open_data platform.
Uman, Ukraine's current regulatory acts issued by its City Council and Executive Committee. The dataset was last updated on 2025-05-20 12:02:13.136202 and is provided via the eu_open_data platform. The data is published by the States site of Ukraine.
The Weapons of Mass Destruction Proxy (WMDP) benchmark is a dataset of multiple-choice questions serving as a proxy for hazardous knowledge in biosecurity, cybersecurity, and chemical security. It was created by author 'cais' and last updated on the Hugging Face platform on 2024-04-27. The dataset is designed to evaluate hazardous knowledge in large language models and benchmark methods for unlearning such knowledge.
This dataset supports the paper proposing the Shortlist Method, a novel algorithm for fast computation of the Earth Mover's Distance and solutions to transportation problems. It contains simulated benchmark data used to test the method's performance against the revised simplex algorithm. The author is Carsten Gottschlich, with the dataset last updated in June 2020.
This collection aggregates all publicly disclosed security vulnerability reports from the HackerOne bug bounty platform. Each entry contains technical documentation including vulnerability descriptions, specific steps to reproduce exploits, and recommended remediation actions.
Publications from the National Institute of Standards and Technology (NIST) Cybersecurity Resource Center. The dataset includes Federal Information Processing Standards (FIPS), Special Publications 800-series (SP800), Special Publications 1800-series (SP1800), NIST Internal Reports (NISTIR), and ITL Bulletins (ITLB). The dataset was uploaded by GotThatData and last updated on December 18, 2024.
CommitChronicle is a large-scale dataset for commit message generation and completion, introduced in an ASE 2023 paper. It contains 10.7 million commits from 11.9 thousand GitHub repositories across 20 programming languages. The dataset was created by JetBrains-Research and last updated on Hugging Face in October 2023.
Thousands of vulnerability records spanning from 1999 to the present, extracted from the National Vulnerability Database (NVD) and organized by year. The collection provides structured JSON data specifically formatted for fine-tuning Llama and OpenAI GPT models on cybersecurity-focused inputs and outputs.
Approximately 500,000 items were extracted and summarized from GitHub code, with a focus on Python-related content. The dataset was created by jtatman and last updated in January 2024. It contains text summaries and licensing information for code snippets.
Trend_Primus_FineWeb-Red is a filtered subset of the trendmicro-ailab/Primus-FineWeb dataset, containing texts related to offensive cybersecurity and penetration testing. The dataset was created by HagalazAI and was last updated on 2025-04-29. It focuses on topics such as exploit development, attack methodologies, and command-and-control frameworks.
Source code functions categorized into binary security classes for identifying software vulnerabilities such as resource leaks and use-after-free errors. The dataset labels code as secure (0) or insecure (1) to facilitate the training of automated defect detection systems.
The Plan of activities for preparation of regulatory acts of the executive committee of Mukachevo City Council for 2021 contains a schedule for drafting local regulations. The dataset likely includes project names, types of acts, adoption objectives, preparation timelines, and responsible bodies. It was published on the States site of Ukraine and last updated on December 20, 2021.
1,563 question-answer-assertion triples evaluate large language models on cybersecurity advice for UK small and medium-sized enterprises. The dataset covers topics like network security, data protection, and user access management. It was created by Rowden and last updated on the Hugging Face platform in December 2024.
Records from the Vatutino City Council executive committee detail citizen appeals received via hotline and emergency services. The data includes registration numbers, timestamps, appeal types, categories, addresses, statuses, executors, and resolution outcomes. This dataset was last updated on November 14, 2024, and originates from the States site of Ukraine.
The Antimonopoly Committee of Ukraine's plan for developing regulatory acts in 2024 includes details on project types, objectives, preparation timelines, responsible subdivisions, and publication references. The plan was last updated on December 24, 2024. The data originates from the States site of Ukraine.
GPM_BASETRMMTMI contains unaltered, raw instrument counts from the TRMM Microwave Imager (TMI) aboard the TRMM satellite. The data is repackaged from CCSDS packets into HDF5 format and geolocated. The product was created by the GES DISC organization, with a last documented update in April 2015.
Source code implementing the method from Hirschberg et al. (2021) for computing rainfall thresholds for debris flows and landslides. The code is provided by ENVIDAT and was last updated in 2021. It includes an example that generates a data file and a figure matching the publication.
Two 1:100,000 scale satellite image maps depict Mount Ruker and Mount Rymill in the Australian Antarctic Territory. The Australian Antarctic Division produced these maps in 1998 using Landsat Thematic Mapper imagery acquired in 1989.
Mayoral orders issued by the executive committee of the Reshetylivka City Council, Ukraine. The dataset was published on the States site of Ukraine and last updated on May 22, 2025. The data likely contains official directives concerning the main activities of the local government.