Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
Kaggle hosts this dataset related to XGBoost, a popular machine learning library. The data likely contains records used for testing and pruning operations within a software development presubmit process. The specific content, scale, and origin require verification after download.
Phishing Email Dataset Simple is a dataset hosted on Kaggle. The title suggests it contains examples of phishing emails, likely for security analysis or machine learning tasks. No details on size, source, or creation date are provided.
A Kaggle dataset likely containing HTML content, URLs, and metadata for phishing websites. The specific number of rows, columns, and data collection method are unknown. The dataset's author, organization, and last update date are also unspecified.
Commit-Bench dataset likely contains metrics or records related to software development commits, sourced from Kaggle. The dataset's specific content, size, and authorship details are unknown from the provided metadata.
A dataset likely containing information related to fixes for Common Vulnerabilities and Exposures (CVE). It is published on Kaggle. The specific content, size, and origin require verification after download.
CVEfixes is a dataset of C and C++ code pairs, likely linking vulnerable code snippets to their corresponding fixes. The dataset is hosted on Kaggle, but details on its size, creation date, and author are not provided in the metadata. Columns and specific content require verification after download.
VICIdial/Asterisk data includes information on security hardening practices, Common Vulnerabilities and Exposures (CVEs), firewall configurations, and access control measures. The dataset appears to compile security-related information specific to the VICIdial and Asterisk telephony platforms. The author, organization, and temporal coverage are unknown.
PhishingWebsites is a dataset from OpenML containing features extracted from URLs and web pages for identifying phishing sites. It is used for training and benchmarking machine learning models in cybersecurity. The dataset's creator and specific temporal coverage are not provided.
Malware Datasets is a collection hosted on Kaggle. The dataset's specific contents, scale, and features are not detailed in the available metadata. Its origin, creation date, and exact composition require verification after download.
This dataset supports research on cybersecurity crowdsourcing, specifically the Hack My Robot case study in construction. The study was conducted by Muammer Semih Sonkor and Borja GarcΓa de Soto.
A dataset for research on zero-day malware detection in Edge IoT environments. The data likely contains graph-structured information for training federated graph neural network models. The dataset's author, organization, and temporal coverage are unknown.
OpenML dataset 'fictif20bkdkmcven7nov2025' is a machine learning benchmark for tabular data. It is part of the OpenML platform's collection of datasets for algorithm testing and comparison. The dataset's specific origin, size, and creation date are not detailed in the available metadata.
Known as titled DAG4RE and is categorized under Computer and Information Science. It was last updated on February 24, 2026, and the author and organization are listed as anonymous.
A dataset named 'data_malware' sourced from Kaggle. The title suggests it contains information related to malicious software, likely for use in security or machine learning applications. No further metadata on size, columns, or origin is available.
A large-scale dataset of phishing and legitimate URLs with engineered features for machine learning. The dataset is sourced from Kaggle, but the specific author, organization, and creation date are unknown. It is designed for binary classification tasks in cybersecurity.
Securecode Dataset is a software security dataset published on HuggingFace by author rufimelo. The dataset was last updated on 2026-02-16. Its specific content and scale are not detailed in the available metadata.
39,000 labeled URLs with extracted features for cybersecurity classification. The dataset is hosted on Kaggle and appears designed for machine learning tasks in threat detection. Its creation date, author, and specific feature details are not provided in the available metadata.
Network traffic data for IoT intrusion and threat detection, designed for hybrid machine learning and deep learning approaches. The dataset supports classification tasks in the domain of mobile and wireless cyber security.
Kaggle hosts a dataset titled 'final_phishing_dataset'. The dataset likely contains records related to phishing attempts, as suggested by its title and platform tags. The author, organization, and specific collection details are unknown.
A dataset related to code, sourced from the Kaggle platform. The specific content, size, and creation details are not provided in the available metadata. Further details such as the author, license, and last update date are unknown.