Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,562 datasets
Supplementary Table S7 contains data related to mitochondrial sequences with dual functions, specifically overlapping protein-protein and protein-RNA gene sequences. The dataset is associated with research investigating the functional importance of these overlapping genetic regions. It is published under a CC-BY-4.0 license by author Noam Shtolz.
A Joint Nature Conservation Committee marine survey collected multibeam bathymetry, sidescan sonar, video, stills, and grab samples in May 2008. The aim was to identify Annex I habitats like Submarine structures and Reefs in the mid-Irish Sea and Solan Bank areas. Data components are archived at the British Geological Survey, DASSH, and UKHO.
14,491 CVEs enriched through CWE to CAPEC to ATT&CK to D3FEND chains form part of this training data. The dataset was distilled from DeepSeek-R1 using a multi-stage pipeline and includes 925 HackTricks pentesting entries and 280 Kali tool descriptions. Author neilopet last updated it on Hugging Face in February 2026.
Campaign Finance - Expenditures contains reported spending and obligations from candidates, officeholders, and political committees in Austin. The dataset is maintained by the City Clerk's office and was last updated in March 2026. Specific row and column counts are not provided.
PRDbench is a benchmark dataset with 50 test cases for evaluating code agents' development capabilities in real-world environments. Each test case includes a PRD requirement (PRD query) and an acceptance scoring scheme (Criteria). The dataset is authored by AGI-Eval and was last updated in December 2025.
Kaggle hosts the CICDDoS2019 dataset. The title suggests it contains network traffic data related to Distributed Denial of Service attacks from 2019. The dataset's specific content and scale require verification after download.
malware_detection_dataset is a dataset hosted on Kaggle. Its specific content, size, and features are not described in the provided metadata. The dataset likely contains features for distinguishing between malicious and benign software.
A cybersecurity dataset likely contains real-world threat intelligence indicators. The description mentions IP addresses, domains, CVEs, and OTX pulses. The dataset was posted on Kaggle, but its author, organization, and last update date are unknown.
A high-quality Chinese-English cybersecurity Q&A dataset contains 270,271 curated entries. It was created by author hcnote from a large-scale corpus, cleaned using DataSanity tool, and last updated in February 2026.
A cleaned and enriched dataset of Common Vulnerabilities and Exposures (CVE) records from the National Vulnerability Database (NVD) spanning 2004 to 2025. The data includes severity scores, CVSS metrics, CWE identifiers, and references, and is described as being prepared for machine learning tasks. The original source is the NVD, and it was aggregated and processed by an author on Kaggle.
500 highly verified question-and-answer pairs and contexts provide a focused resource for training language models on cybersecurity topics. This dataset originates from Kaggle, though its author, creation date, and specific source are unspecified. The sample size suggests it is designed for model development and testing rather than large-scale analysis.
Network traffic data related to Distributed Denial of Service (DDoS) attacks. The dataset is hosted on Kaggle, but its specific origin, collection method, and scale are not detailed in the available metadata. The exact number of records, features, and time period covered are unknown.
Kaggle hosts a dataset titled 'ddos_attack'. The dataset likely contains records related to Distributed Denial of Service attacks. Its specific contents, scale, and provenance are unknown.
The Hunger and Nutrition Commitment Index (HANCI) 2014 ranks 45 developing country governments on their political commitment to tackling hunger and undernutrition. It measures performance across 22 indicators covering legal frameworks, policies and programmes, and public expenditures. The index was created by Lawrence Haddad to provide transparency and support accountability.
This dataset supports research on estimating labor-supply elasticities, accounting for limited commitment between spouses. It is based on PSID data and presents estimates of approximately 0.65 for men and 0.8 for women. The methodology addresses bias from using household-level consumption data.
This archival dataset supports research on estimating labor-supply elasticities for married couples, accounting for intra-household consumption distribution. The analysis uses PSID data and estimates Frisch elasticities of approximately 0.65 for men and 0.8 for women. Specific row counts, column details, and file formats are not provided.
Records of appeals to the Malyn City Council hotline include registration numbers, receipt dates, and resolution outcomes. The dataset is hosted on the States site of Ukraine and was last updated on March 3, 2026. It likely contains structured information on appeal types, thematic categories, and assigned performers.
Tournesol provides preference-learning and social-choice data structures for YouTube recommendations, developed by the tournesol-app organization and updated through March 2026. The platform implements Bradley-Terry models and preference aggregation to rank content based on AI ethics. It serves as an open-source framework for community-driven video evaluation.
A multi-language source code dataset originating from Indonesia's digital tourism economy. The dataset was sourced from Kaggle, but the author, organization, and last update date are unknown. The total number of rows, file formats, and specific columns are not detailed in the available metadata.
A dataset focused on phishing and social engineering content in the Arabic language. It is hosted on Kaggle, but its specific size, creation date, and authorship are not detailed in the provided metadata. The exact content and structure require verification after download.