Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,586 datasets
An R package created by Hadley Wickham to automate manual setup tasks for software projects. The tool handles configuration for unit testing, test coverage, continuous integration, Git, GitHub, licenses, Rcpp, and RStudio projects. The dataset likely contains configuration files, scripts, or metadata generated by the package's automation workflows.
Records membership on all congressional committees for the 103rd to 105th Congresses, covering the period from 1993 to 1998. The dataset is updated periodically from the Congressional Record and was compiled by Charles Stewart III and Jonathan Woon.
Kaggle hosts a dataset focused on spam and phishing content. The dataset likely contains records for training and evaluating email classification models. Its specific size, features, and collection methodology are not detailed in the available metadata.
A clean dataset of near-Earth objects compiled by NASA. The description indicates it contains orbital and physical properties for these celestial bodies. The specific time range, row count, and last update date are unknown.
Malebin is a dataset of malware binaries represented as RGB images, hosted on Kaggle. The dataset's author, organization, and specific collection details are not provided in the available metadata. The number of samples, file formats, and license information are also unknown.
CVE-Factory Agent Traces is a dataset of 4,078 distilled agent traces generated by Luoberta for 887 CVE reproduction tasks. The traces were created using Claude Opus 4.5 with a Mini SWE-Agent harness. The dataset was last updated on February 4, 2026.
Phishing Website Dataset is a collection of data related to fraudulent websites, hosted on Kaggle. The dataset likely contains features useful for distinguishing legitimate sites from phishing attempts. Its specific size, origin, and update history are not detailed in the provided metadata.
Austin's Committee Purpose Dataset details political committees supporting, opposing, or assisting candidates, officeholders, or ballot measures. The data is provided by the City of Austin City Clerk's office and was last updated in March 2026. Available file formats include JSON, RDF, CSV, and XML.
A 2020-2026 collection of vulnerability intelligence data, likely sourced from the National Vulnerability Database (NVD). The dataset appears to combine Common Vulnerabilities and Exposures (CVE) identifiers with severity (CVSS), exploit status (KEV), and exploit prediction (EPSS) scores. It was published on Kaggle, but the specific author, update frequency, and exact data collection method are not detailed.
Neighborhood Stabilization Program (NSP) Snapshot Reports provide a synopsis of financial performance for NSP grantees. The data is published quarterly by the Department of Housing and Urban Development (HUD) and includes program-wide and individual grantee snapshots for NSP1, NSP2, and NSP3. These reports are intended to increase public transparency regarding the progress of the NSP.
A chapter reviewing the history of psychological interventions for chronic pain, including third-wave approaches like ACT. The author, Kevin E. Vowles, examines key treatment processes, measurement methods, and evidence of effectiveness. The work concludes with a discussion of specific clinical issues.
UNSW-NB15 is a cleaned and optimized dataset for network intrusion detection. The raw description indicates it is a collection of network traffic data, but specific row counts and column details are not provided. The data has been preprocessed, with encoding and scaling left to the user.
Android Malware Dataset is a collection of data related to malicious software targeting the Android operating system, published on Kaggle. The dataset likely contains features for classifying and analyzing malware samples. Specific details such as the number of samples, feature columns, and collection methodology are not provided in the available metadata.
A collection of flow features for detecting LDDoS attacks in IoT environments. The features are derived from the CICIoT2023 dataset using spatial, Fast Fourier Transform (FFT), and Discrete Wavelet Transform (DWT) methods. The author, organization, and specific data volume are not provided.
FDTransformer is a dataset containing source code, published on Kaggle. The dataset's specific purpose, size, and author are not detailed in the available metadata. Its content and structure require verification after download.
A dataset of Common Vulnerabilities and Exposures (CVE) records. The title suggests coverage from 1999 to 2026. It is published on Kaggle, but the original author and specific collection method are unknown.
242,000 URLs have been processed with engineered features for phishing detection. The dataset likely contains attributes designed to distinguish malicious URLs from legitimate ones. Its origin, author, and specific feature definitions are unknown.
A medical image dataset likely containing annotations for seven anatomical structures: liver, kidney, hepatic vessel, pancreas, colon, lung, and spleen. The dataset was sourced from Kaggle, but the author, organization, and specific collection details are unknown. The last update date and dataset size are also unspecified.
SoloSpeak source code is available on Kaggle. The dataset's specific contents, such as the programming language, project size, and purpose, require verification after download. Metadata is minimal; details about the author, organization, and last update are unknown.
QuantFlow source code is a dataset published on Kaggle. The dataset's content and structure are not described in detail. Further verification is required to determine its specific contents and intended use.