Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,561 datasets
A dataset related to Bitcoin ransomware transactions. No information is available on its size, structure, or origin.
A 2026 replication package from the University of Oulu contains materials for an experiment on the effect of requirements framing on confirmation bias in software testing. The package includes actual experiment data, baseline test suites, and the requirements framing used in the study. It is intended to support replication and further research in software engineering psychology.
Canada's Office of the Privacy Commissioner prepared a briefing package for a November 20, 2025 parliamentary committee appearance concerning Bill C-12, the Act Strengthening Canadaβs Immigration System and Borders. The document is an HTML file published under an open government license and updated in March 2026.
Archived correspondence from Global Affairs Canada details amendments to Article 104.1 of the North American Free Trade Agreement (NAFTA). It consists of an exchange of letters between Canada, the United States, and Mexico clarifying trade rules and specific obligations. The information is out of date and intended for research or recordkeeping purposes only.
Introductory remarks and key issues presented to the OLLO committee regarding proposed Administrative Monetary Penalties regulations. The document outlines seven key issues, including maximum penalty, implementation process, and scope. It references statistics on admissible complaints from May 2016 to May 2024.
2,000 rows of data derived from the PhishingWebsites dataset on OpenML, subsampled using a random seed of 4. The subset likely contains 100 features and up to 10 target classes, generated by Eddie Bergman. The original dataset is in the public domain (us-pd).
A 2,000-row, 100-feature subset of the PhishingWebsites dataset, generated via a controlled subsampling process with a random seed of 3. The data, originally from OpenML, was created by Eddie Bergman and is released under a US public domain license. It contains 10 target classes, sampled using a stratified method.
A subsampled version of the PhishingWebsites dataset from OpenML, created with a specific random seed. The subset contains up to 2000 rows, 100 columns, and 10 classes, generated via stratified sampling. The original dataset is in the public domain, authored by Eddie Bergman.
A 2000-row, 100-feature subset of the PhishingWebsites dataset, created via a reproducible subsampling script with a random seed of 1. The data was originally authored by Eddie Bergman and is shared under a US public domain license. The subsampling process selected up to 10 classes and used stratified sampling.
A 2000-row, 100-column subsample of the PhishingWebsites dataset, generated with a random seed of 0 and stratified sampling. The dataset likely contains features for classifying websites as legitimate or phishing. It was created by Eddie Bergman and is shared under a US public domain license on OpenML.
Canada's Standing Committee on Human Resources, Skills and Social Development and the Status of Persons with Disabilities (HUMA) received an appearance from the Secretary of State (Labour) on November 20, 2025. The briefing covers the study of the committee's mandate and priorities. The document is an HTML record of the government proceeding.
SecureVibeBench is a dataset for evaluating secure coding and vulnerability detection, introduced in a paper accepted at the ACL 2026 Main Conference. It is hosted on Hugging Face by the author iCSawyer and was last updated on April 15, 2026. The dataset includes fields such as task IDs, repository URLs, and vulnerability-inducing commit hashes.
BMR marine program Report by a BMR committee on forward marine program is a legacy document published on data.gov.au. The dataset is managed by the Australian Ocean Data Network and was last updated in April 2026. The abstract is unavailable, and the content likely contains a committee report on marine science planning.
10,000 webpages (5,000 phishing and 5,000 legitimate) were downloaded between January 2015 and June 2017. This dataset contains 48 features extracted from each webpage using an improved technique based on the Selenium WebDriver browser automation framework. The dataset was created by Shashwat Tiwari and is associated with research by Tan, Choon Lin from 2018.
10,000 webpages (5,000 phishing and 5,000 legitimate) downloaded between January-May 2015 and May-June 2017 are represented by 48 extracted features. The dataset was created by Shashwat Tiwari using an improved feature extraction technique leveraging the Selenium WebDriver browser automation framework. It originates from research by Tan, Choon Lin (2018) and is shared under a CC-BY-4.0 license.
Australia's authoritative gazetteer provides information on the location and spelling of approved place names for the mainland, external territories, and offshore areas to the 3-mile marine limit. The 2010 release consists of over 300,000 place names compiled by Geoscience Australia. Data is sourced from State and Territory jurisdictions and Australian Government agencies.
Geoscience Australia Data published a report titled 'Future of the Deep Sea Drilling Project after 1975' on data_gov_au. The document likely contains the proceedings and planning discussions from an open meeting of the JOIDES planning committee held in Zurich, September 26-28, 1973. The dataset is available in PDF and HTML formats.
A portfolio dataset uploaded by cont1n3nt for a GitHub project. It is intended for training and testing machine learning models, specifically in the context of AI phishing detection. The dataset was last updated on May 14, 2026.
December 1st, 2025 briefing package prepared for a Senate Standing Committee hearing. The document addresses the 2025 Fall Reports of the Auditor General of Canada on housing for the Canadian Army Forces, military recruitment, and cyber security. It was authored by the Office of the Auditor General of Canada.
Offensive Cyber Task Horizons contains expert completion and estimation data from a human study described in a 2026 paper by Payne, Miller, and Peters. The dataset provides human-derived timing data that anchors the Item Response Theory (IRT) methodology for measuring AI offensive cybersecurity capability growth. For full reproducibility artifacts, including model logs and analysis pipelines, the dataset page references a GitHub repository.