Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
Phishing URLS UCI is a dataset from the UCI Machine Learning Repository, hosted on Kaggle. The dataset likely contains features for classifying URLs as legitimate or phishing. Its specific size, features, and collection date are not detailed in the provided metadata.
A dataset for detecting phishing URLs, published on Kaggle. The specific number of records, features, and collection methodology are not detailed in the available metadata. Further details about the dataset's origin, size, and structure require verification after download.
A dataset focused on phishing detection, sourced from Kaggle. The specific content, scale, and features require verification after download. Details on authorship, collection method, and temporal coverage are not provided in the metadata.
Microsoft Malware Prediction is a dataset hosted on Kaggle. The dataset likely contains features for predicting malware threats on Windows systems. Its specific contents, size, and origin require verification after download.
A compilation of 500 Advanced Very High Resolution Radiometer (AVHRR) satellite images documenting sea ice conditions near five Antarctic stations from 1992 to 1999. Dr Kelvin Michael supervised the project for the Australian Antarctic Division, producing the atlas in both hard copy and digital formats. Images are renavigated onto a polar stereographic projection and include visible/thermal bands for summer and thermal-only for winter months.
Five-year capital investment plans detail projects the Metropolitan Transportation Authority commits to funding. This dataset covers the 2025-2029 plan, succeeding the 2020-2024 plan. It is published by data.ny.gov and was last updated in February 2025.
A collection of phishing and legitimate emails generated using Large Language Models, specifically DeepSeek for Chinese emails and OpenAI models for English emails. The dataset is intended to facilitate research and development in phishing email detection and classification. It was created by Dizzzy0x00 and last updated on December 13, 2025.
Windows Portable Executable Samples are provided for malware analysis. The dataset includes four distinct feature sets, though the specific features and data volume are not detailed. It originates from Kaggle, but the author, organization, and last update date are unknown.
Kaggle hosts a dataset on cybersecurity incident reports filed to local authorities. The description suggests it contains counts of such reports, but the specific volume, time range, and geographic scope are not detailed. The author, organization, and last update date are unknown.
A dataset of extracted features for ransomware detection, likely containing various attributes for analysis. The dataset is hosted on Kaggle, but specific details about its size, author, and creation date are not provided. Its primary purpose is to support the development of machine learning models for identifying ransomware.
111,000 URLs labeled for phishing detection, each characterized by 22 distinct features. The dataset was published on Kaggle in September 2025 and is described as real-world data.
A document containing official public health recommendations from the Advisory Committee on Immunization Practices (ACIP). The content focuses on the prevention and control of meningococcal disease, with specific guidance for college student populations. The dataset is sourced from the paperswithcode platform, but its specific publication date and original author are unknown.
Official recommendations from the Advisory Committee on Immunization Practices (ACIP) for preventing and controlling meningococcal disease. The dataset likely contains the full text of the published guidelines, including rationale and evidence reviews. It is sourced from the paperswithcode platform, which aggregates research resources for AI/ML practitioners.
Preventing Pneumococcal Disease Among Infants and Young Children: Recommendations of the Advisory Committee on Immunization Practices (ACIP) is a dataset from paperswithcode. It likely contains the text of official public health guidelines and supporting information. The dataset's specific size, format, and creation date are not provided in the metadata.
Serving as designed for binary classification to detect phishing in web pages. It is tagged for exploratory data analysis and classification tasks.
Examples of spam emails. The specific number of rows, columns, and data features are not provided in the input.
Two categories of SMS messages, spam and scam, received by a personal recipient in the Philippines. The data consists of raw text strings representing fraudulent communications common in the local telecommunications landscape. These messages reflect real-world unsolicited mobile traffic from Philippine network providers.
21,258 high-quality system, user, and assistant triples for training alignment-safe, defensive cybersecurity large language models. The dataset was curated from over 100,000 technical sources, rigorously cleaned and filtered to enforce strict ethical boundaries, and is Apache-2.0 licensed. It was created by AlicanKiraz0 and last updated on June 21, 2025.
A dataset from Kaggle likely containing information related to LeetCode, a platform for coding interview preparation. The specific contents, such as problem descriptions, user solutions, or performance metrics, require verification after download. Metadata is minimal; actual content requires verification after download.
Formal dwelling commitments in Cambridgeshire as of March 31, 2017, including sites with planning permission or allocated for development. The data is broken down by district, parish, settlement, and the status of permission or development. It was published by the Government Digital Service and supersedes data from previous years.