Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
Cybersecurity Cascade Trigger Node Detection is a dataset hosted on Kaggle. The dataset likely contains information related to identifying critical nodes or events within security event cascades. Metadata such as column descriptions, sample data, and size are unavailable, requiring verification after download.
Phishing_site_urls is a dataset likely containing website addresses labeled for malicious intent. The dataset is published on Kaggle, a platform for data science competitions and projects. Specific details such as the number of URLs, collection timeframe, and original author are not provided in the available metadata.
Synthetic network logs simulate a range of basic and advanced cybersecurity attacks. The dataset's volume, creator, and update history are unspecified. It is hosted on Kaggle with platform tags indicating its focus on social networks, cyber security, and data analytics.
Tokenized commit history data extracted from 53 GitHub repositories. The dataset includes full commit diffs, processed for text analysis tasks. Author, collection method, and specific time range are not specified.
Phishing website dataset for machine learning classification. The dataset's specific size, creator, and update history are not documented. It originates from the Kaggle platform.
A collection of phishing and legitimate email messages translated into Bahasa Melayu. The dataset is sourced from Kaggle and is intended for cybersecurity research. The author, organization, and specific size details are not provided.
CISA's CVE Vulnrichment Dataset provides standardized vulnerability information. It includes Common Vulnerability Scoring System (CVSS) metrics and Stakeholder-Specific Vulnerability Categorization (SSVC) scores. The dataset is maintained by the U.S. Cybersecurity and Infrastructure Security Agency.
This repository curated by Jieyab89 serves as a centralized directory of Open Source Intelligence (OSINT) tools, wikis, and educational resources updated as of March 2026. It organizes links and references across specialized intelligence branches including SOCMINT, IMINT, and MASINT for cybersecurity and investigative research.
Kaggle hosts a dataset titled 'MALWARE'. The dataset likely contains information related to malware detection or analysis, as suggested by its title and platform tags. Metadata such as column definitions, size, and authorship are currently unknown.
InSDN is a dataset derived for learning purposes, focusing on network security. It contains traffic data labeled for categories including DDoS, Probe, Normal, DoS, and Brute Force attacks. The dataset's author, organization, and specific collection details are not provided.
A 2009 U.S. Senate Judiciary subcommittee hearing transcript titled 'Human Rights at Home: Mental Illness in U.S. Prisons and Jails'. The record is sourced from the paperswithcode platform. The description indicates the DOI for this record was created in error and is not attached to metadata.
Survey data likely exploring the relationship between employee benefits and continuance commitment within the Nigerian manufacturing sector. The dataset is published by Iosr Journals on the paperswithcode platform. The specific number of observations, variables, and temporal coverage is unknown from the provided metadata.
Terraform infrastructure-as-code files are the subject of this dataset for predicting software defects. It likely contains metrics and labels for files to support machine learning models in software quality analysis. The dataset is published on Kaggle, but its specific size, origin, and creation date are not provided.
Webpagephishing is a dataset hosted on Kaggle. The dataset likely contains features for identifying phishing web pages. Specific details on size, columns, and creation are unavailable from the provided metadata.
Puyang2025's dataset provides a unified, row-level email corpus built from seven commonly used public email datasets for phishing and spam detection research. Each row contains the email body text, optional header-like fields, a source dataset identifier, and a binary label. The dataset was last updated on HuggingFace in January 2026.
The Gitee Code dataset was compiled from code repositories hosted on Gitee, China's largest code hosting platform. It is authored by nyuuzyou and was last updated on 2026-01-08. The dataset is intended for training code models with strong Chinese language understanding and coding conventions.
A chapter reviewing the history and evidence base of psychological treatments for chronic pain. The text focuses on Acceptance and Commitment Therapy (ACT), outlining its theoretical model, measures, methods, outcomes, and mechanisms. The chapter concludes with a discussion of clinical issues and future research directions.
A dataset for predicting defects at the file level within Kubernetes projects. It was published on Kaggle, but the author, collection method, and specific size are not detailed in the provided metadata. The dataset's content and structure require verification after download.
A dataset for predicting defects at the file level within Docker projects. It is hosted on Kaggle, but the specific number of rows, columns, and data collection details are not provided in the available metadata. The dataset's content and structure require verification after download.
Municipal Land and Spatial Planning District City of Hofheim am Taunus - Further Articles of Association provides exterior and interior statutes, development plans, land use plans, and other urban planning documents. The data is provided via the GDI-SΓΌdhessen platform www.gdi-inspireumsetzer.de and was last updated on 2025-12-23.