Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
Phishing Dataset 2 likely contains examples of phishing attempts for cybersecurity analysis. The dataset is published on Kaggle, but its specific contents, size, and authorship are unknown. Columns suggest it may include features related to email content, URLs, or network traffic.
A dataset related to cybersecurity, likely containing information pertinent to network security or threat detection. It is hosted on Kaggle, but details about its creator, size, and specific contents are not provided. The dataset's age and update frequency are unknown.
LUFlow is a novel data set for analyzing and detecting emerging threats in network traffic. The dataset likely contains features related to network flows and intrusion patterns. Its author, organization, and specific size are unknown.
A filtered subset of the TACO dataset, last updated in April 2025, containing only verified programming solutions that pass all test cases. The dataset, created by author likaixin, includes 12,898 problems and 1,043,251 solutions, with a 71.03% correct ratio after removing failing solutions and problems with no correct answer.
Phishing websites data is a collection of features used to distinguish legitimate and fraudulent web pages. The dataset is hosted on the UCI Machine Learning Repository, a known source for benchmark datasets. The original creator and specific collection date are not provided.
Website Phishing is a classic dataset from the UCI Machine Learning Repository for training classifiers to identify fraudulent websites. It contains labeled examples of phishing and legitimate sites, characterized by features like URL structure, page content, and security indicators. The dataset is a foundational resource for academic and industrial research in cybersecurity.
A dataset from the UCI Machine Learning Repository containing information on Bitcoin addresses associated with ransomware activity. It is used for network analysis and security research, focusing on illicit transactions within the cryptocurrency ecosystem.
A collection of malware samples with extracted static and dynamic features, sourced from the VxHeaven repository and VirusTotal. The dataset is intended for cybersecurity research and machine learning model development. The original compilation and feature extraction were performed by contributors to the UCI Machine Learning Repository.
A dataset for malware classification tasks, sourced from the UCI Machine Learning Repository. It is intended for building and evaluating models that can identify different types of malicious software. The specific temporal coverage and collection method are not detailed.
LT-FS-ID is a dataset from the UCI Machine Learning Repository for intrusion detection in Wireless Sensor Networks (WSNs). It contains labeled network traffic data for training and evaluating machine learning models to identify security attacks. The dataset's creator and specific collection date are not provided.
TUANDROMD is a dataset for Android malware detection created by Tezpur University. It contains labeled samples for training and evaluating machine learning models in cybersecurity. The dataset's specific size and update date are not provided.
UCI Machine Learning Repository hosts a dataset of shell commands executed by participants during hands-on cybersecurity training exercises. The data is tabular and captures user behavior in command-line environments. The original creator and specific collection timeframe are not documented.
PhiUSIIL Phishing URL (Website) is a collection of URLs labeled for phishing detection. The dataset originates from the UCI Machine Learning Repository and is tagged for URL classification and web security. Specific details on its size, creation date, and author are not provided.
Featuring source code for assimilating microseismic and tracer data to characterize three-dimensional permeability in enhanced geothermal reservoirs. It was authored by Jiang, Zhenjiao and last updated in February 2026.
40,000 records documenting cyber security attacks across 25 distinct metrics. The data provides a structured overview of security incidents and their associated technical parameters.
A text dataset of cybersecurity-related conversations in Italian, sourced from ShareGPT. The dataset was uploaded by author Mattimax to the Hugging Face platform and was last updated on 2026-02-09. The specific content, scale, and structure require verification after download.
This dataset originates from a crowdsourcing platform case study conducted by Muammer Semih Sonkor and Borja GarcΓa de Soto. It is hosted by Harvard Dataverse and was last updated in January 2026. The specific data volume, structure, and features are not detailed in the provided input.
AI AppSec Index provides data related to AI application security. The description mentions AI remediation benchmarks, ASPM matrices, CVE mappings, and CRA compliance. It is hosted on Kaggle, but the author, organization, and specific data volume are unknown.
This repository contains the empirical data and source code used to generate examples for the book "Evidence-based Software Engineering based on the publicly available data" by Derek Jones. Updated in February 2026, the collection aggregates diverse datasets spanning software evolution, developer psychology, and economic modeling. It serves as a foundational resource for reproducing statistical analyses in empirical software engineering.
A collection of labeled source code snippets across 3+ programming languages including C++, Java, and Python. The dataset categorizes code blocks as either vulnerable or secure to facilitate training for automated security auditing.