Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
10 years of global cybersecurity incident records across categories including attack vectors and threat data. The data tracks the progression of digital security breaches across international borders from 2015 to 2024.
Phishing_dataset_2025 is a cybersecurity dataset published on Kaggle. The dataset likely contains features for identifying malicious emails or websites. Its specific size, columns, and origin are currently unknown.
A dataset for identifying malicious web addresses, sourced from Kaggle. The dataset's specific size, features, and collection timeframe are not detailed in the provided metadata. Its content and structure require verification after download.
A transcript from a 2008 hearing before the U.S. Senate Committee on Indian Affairs. The hearing focused on access to contract health services in Indian Country. The dataset appears to be a text record of the proceedings, sourced from the paperswithcode platform.
A dataset combining Parkinson's disease features with network characteristics for intrusion detection system research. The dataset was sourced from Kaggle, but the author, organization, and creation date are unknown. The specific number of records and features is not provided in the available metadata.
A cybersecurity dataset for detecting malicious software packages and phishing URLs. The description suggests it is intended for use with deep learning models. The dataset's size, origin, and specific features are not detailed in the provided metadata.
Bangla-language phishing data for machine learning and cybersecurity research. The dataset contains simulated phishing content across SMS, email, and URL formats. It was sourced from Kaggle, but details on the author, organization, and last update are unavailable.
Screenshots of legitimate emails and phishing emails. The dataset likely contains visual representations of email content for classification tasks. The author, organization, and temporal coverage are unknown.
NIDS-28 is a multi-class network intrusion detection dataset. The dataset's author, organization, size, and temporal coverage are not specified in the available metadata. It was sourced from the Kaggle platform.
An IoT network traffic dataset intended for intrusion detection system development. The dataset is hosted on Kaggle and is described as suitable for classification tasks. Specific details on size, origin, and temporal coverage are not provided in the available metadata.
Kaggle hosts a dataset focused on cybersecurity for Internet of Things (IoT) and Industrial Internet of Things (IIoT) applications. The dataset is intended for classification tasks, including binary and multiclass classification. Author, organization, and specific data volume are not provided.
URL strings and classification labels corrected using a verified reference dataset. This data provides a high-fidelity ground truth for distinguishing between phishing and legitimate web addresses by resolving label noise.
Jack Snyder's research for Chapter 7 of "The Ideology of the Offensive" adds considerable information not widely available in the West. The evidence comprises many primary sources and several Soviet scholars' archival research, collected in the 1980s. The data explores the causes of a shift in Russian military strategy between 1910 and 1914.
Every Common Vulnerability and Exposure (CVE) record published since 1999, scrubbed and formatted for natural language processing tasks. The dataset contains all historical CVE entries, providing a complete timeline of publicly disclosed vulnerabilities. The original author and organization are unknown.
New York City capital commitment plan data details project budgets by type, line, and funding source, with dollar values in thousands. The dataset was updated three times annually during Preliminary, Executive, and Adopted Capital Commitment Plans until January 2024, when updates ceased. It was published by the New York City Office of Management and Budget.
Capital Project Detail Data - Milestones contains capital commitment plan data from the City of New York. The dataset includes project schedules, managing agencies, and descriptions, and was updated three times a year until January 2024. It is hosted on data.cityofnewyork.us and was last updated on 2024-03-19.
Over 100 place names in Edmonton with Indigenous roots are documented, including streets, parks, and neighbourhoods. Local geographer Matthew Dance created this dataset in collaboration with the City of Edmonton. It was last updated in March 2024.
1,000 vulnerabilities sourced from CVEs (2015-2025) across 65 CWE categories in Go, JavaScript, and Python. The collection includes 230 instances paired with Dockerized sandbox environments for runtime patch validation through Proof-of-Concept (PoC) and unit testing.
Decade-wise and year-wise rankings for the ATP Tennis Tour, sourced from a GitHub dataset. The dataset likely contains historical ranking data for professional tennis players. It is hosted on Kaggle for exploratory data analysis.
1140 programming tasks are available in two variants: one for code completion from docstrings and another for code generation from natural language instructions. Each task includes an average of 5.6 test cases with 99% code coverage. The dataset was created by the BigCode project and updated in April 2025.