Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,562 datasets
Canadian Radio-television and Telecommunications Commission (CRTC) Chairperson's briefing presented to the Standing Committee on Industry and Technology. The document is an HTML record of formal government testimony. It was last updated on February 25, 2026.
119,000 diamond listings scraped from brilliantearth.com to demystify the value of the 4 Cs (cut, color, clarity, carat). The dataset includes attributes like price, carat weight, shape, and certificate type, and was collected using a tool called DiamondScraper.
Reports on the work of the service for children of the executive committee of Ivano-Frankivsk City Council. The dataset originates from the States site of Ukraine and was last updated on 2026-03-03. Available file formats include Word documents, Excel spreadsheets, and CSV files.
Real cybersecurity dialogue records from the Langfuse platform, processed for traditional Chinese reasoning SFT training. The data underwent deduplication, answer regeneration via the Gemini model, and thought process rewriting via the Qwen model.
A 2026 study by Jevon Dixon maps cross-national cybersecurity cooperation patterns using maritime risk and connectivity indicators. It applies principal component analysis and hierarchical clustering to group states based on governance, economic capacity, ICT development, and maritime activity. The analysis uses V-tests to show how structural country profiles are associated with varying levels of cybersecurity cooperation activity.
A briefing package prepared for the Minister of National Defence's appearance before the Standing Committee on National Defence on October 27, 2025. The materials were created by the Department of National Defence and published under the Open Government Licence - Canada 2.0. The package was last updated on February 24, 2026.
HPM-Net source code is available on Kaggle. The dataset's specific content, size, and structure are not detailed in the provided metadata. Metadata is minimal; actual content requires verification after download.
A list of land lease agreements renewed by the Standing Committee of the Kyiv City Council on Architecture, Urban Planning and Land Relations without a full council decision. The dataset is provided by the States site of Ukraine and was last updated on March 3, 2026. The specific number of agreements and data rows is not provided in the metadata.
ARC-AGI-2 is a dataset for artificial general intelligence (AGI) benchmarking, derived from a GitHub repository. The original source contains 1000 training examples and 120 test examples, which were flattened to account for files containing multiple tests. The dataset was uploaded by 'sirorezka' and last updated on Hugging Face in March 2026.
Packed with 845,373 network flow records engineered with 40 statistical flow-level features. It is designed for detecting Distributed Denial-of-Service attacks, specifically focusing on LDAP-based DDoS traffic, and uses a dual-labelling scheme for binary and attack-specific classification.
Phishing URL Threat Intelligence (F3EAD) is a dataset hosted on Kaggle. The title suggests it contains information related to malicious URLs used in phishing campaigns. The dataset's specific contents, size, and origin are not detailed in the provided metadata.
50 NIfTI-format brain MRI test cases from the BraTS GLI challenge. The dataset likely contains multi-modal MRI scans for brain tumor segmentation tasks. It is published on Kaggle, but the original author, organization, and collection date are unknown.
This dataset tracks military alliance reliability over a 200-year period, providing estimates of treaty fulfillment and violation. Compiled by Daina Chiba for International Studies Quarterly, it analyzes data at both the alliance and individual ally levels of analysis. The records highlight a shift in reliability patterns, particularly regarding multilateral alliances in the post-WWII era.
Kaggle hosts a corpus of 60,000 AI-generated emails. The description indicates the emails include phishing attempts, legitimate messages, and specific categories like Business Email Compromise (BEC) and cryptocurrency-related content. The author, organization, and collection date are unknown.
A dataset focused on network intrusion detection, published on Kaggle. The specific source, collection method, and temporal coverage are not provided in the available metadata. The dataset likely contains records of network traffic or security events for analysis.
OTX Threat Insights likely contains cybersecurity pulses detailing attack techniques and indicators. The dataset appears to be sourced from the AlienVault Open Threat Exchange (OTX) platform, a community-driven threat intelligence service. Its specific size, update frequency, and detailed structure are unknown.
86 reports from the UK's Peterhead and White Rose Carbon Capture and Storage Front End Engineering and Design projects. The reports, published by DECC in 2015 and 2016, share knowledge on commercial, technical, and regulatory aspects of large-scale CCS projects. They were produced under government contracts to disseminate learning from these industrial initiatives.
A 2022-2025 contract between the London Borough of Barnet and Civica UK Ltd for the Moderngov committee papers content management system. The dataset contains award details, contract dates, and governance tags.
London Borough of Barnet performance indicators reported quarterly to eight committees including Adults and Safeguarding, Environment, and Housing and Growth. The data tracks progress against the 2019/20 Annual Delivery Plan under the Barnet 2024 Corporate Plan. It covers themes such as Community Safety, Family Services, Finance, and Environment.
Over 1.3 million source code files from approximately 4,700 top-ranked GitHub developers, spanning the period from 2015 to 2025. The collection includes code in more than 80 programming languages, such as Python, JavaScript, and Rust.