Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,561 datasets
June through August 2025 saw 111,969 Pull Requests from AI coding agents and human contributors, released with a paper accepted at MSR 2026. The dataset includes activity metadata for analyzing agent contributions in software development. It was created by AISE-TUDelft and last updated on Hugging Face in April 2026.
Transportation Safety Board of Canada briefing package prepared for the Chair's appearance before the Standing Committee on Government Operations and Estimates in February 2026. The document is a PDF text file last updated in March 2026.
A monthly list of contracts awarded by the Joint Nature Conservation Committee with a value above Β£10,000. The data is published as part of the UK Government's commitment to transparency. The dataset was last updated on 2026-04-14.
Global Affairs Canada provides briefing materials prepared for the Minister of International Development's appearance before the Standing Committee on Foreign Affairs and International Development. The dataset contains text documents in both English and French, last updated in March 2026. Row and column counts are not specified.
Meeting minutes for the National Indigenous Advisory Committee, published by the Correctional Service of Canada. The dataset consists of PDF documents, with the most recent update recorded in March 2026.
MiniMax M2.5 Code Distilled 14K is a synthetic dataset for code generation created by Madras1. It contains Python coding problems, chain-of-thought reasoning, and execution-verified solutions that passed automated tests.
4,209 real AI-to-AI prompt injections collected by researcher David Keane for an MSc Cybersecurity project at the National College of Ireland in February 2026. The dataset documents the adversarial testing journey from a 'dentist chatbot' to a defensive system called 'CyberRanger V42 Gold' with a 100% block rate.
One bilateral trade agreement between Canada and the Commonwealth of the Bahamas focuses specifically on the rum sector. The archived document outlines trade arrangements and commitments for rum production and export. It was published by Global Affairs Canada and archived as of February 2026.
Spectral library of aquatic substrates from the Adelaide Coastal Waters, collected in 2003. It is part of the Australian National Spectral Database and was used in a remote sensing study of marine and coastal features.
The Australian Ocean Data Network hosts a legacy report titled 'Future of the Deep Sea Drilling Project after 1975 Report on an open meeting of the JOIDES planning committee held in Zurich, September 26-28, 1973'. The dataset consists of document files in PDF and HTML formats. The content likely details discussions and planning for the Deep Sea Drilling Project's future direction.
LangMap-TheStack-python-100M is a dataset of 100 million tokens of sanitized Python source code, intended for code finetuning. It was created by MultilingualUnigramLM and streamed from the bigcode/the-stack repository. The dataset was last updated on April 13, 2026.
LangMap-TheStack-csharp-100M is a dataset containing 100 million tokens of C# source code, intended for code finetuning. The data was streamed from the bigcode/the-stack repository and tokenized using the allenai/OLMo-3-1025-7B tokenizer. It was created by MultilingualUnigramLM and last updated on Hugging Face on April 13, 2026.
A dataset of 9,999 structured vulnerability analyses covering 20 security domains. Each row includes root cause, exploitation methodology, detection rules, CVSS v3.1 scoring, MITRE ATT&CK mapping, and remediation guidance. The dataset was created by user 'sh111111111111111' and last updated on March 18, -2026.
AsiaCCS 2021 research provides a collection of benign executable binaries used to study the robustness of machine learning-based static malware analysis. The dataset supports work on adversarial attacks where executable bytes are modified to evade detection. It is associated with the paper 'Malware Makeover: Breaking ML-based Static Analysis by Modifying Executable Bytes.'
Official testimony from the Privacy Commissioner of Canada to the Standing Committee on Public Safety and National Security on October 30, 2025. The briefing package analyzes Bill C-8, a legislative proposal amending the Telecommunications Act to address cybersecurity. This government document was published by the Office of the Privacy Commissioner of Canada.
A final scientific report from the SCOPE IAI project 003, analyzing nitrogen transport and transformations on regional and global scales. The report synthesizes knowledge from five international workshops held since 1994. It was produced by the Scientific Committee on Problems of the Environment (SCOPE) and published by SpringerLink.
Real malware samples and associated execution artifacts collected as part of the ULE-CIBERLAB Project, funded by the European Union NextGeneration-EU. The dataset includes executable files (.exe) and related JSON/HTML reports, screenshots, and dropped files from CAPEv2 sandbox analysis. It was uploaded by unileon-robotics and last updated on March 26, 2026.
A trade protocol outlines cooperative arrangements and commitments between Canada and Member States of the Caribbean Common Market concerning the rum sector. The document is an archived publication from Global Affairs Canada, last updated in the platform on February 24, 2026. It is intended for research or recordkeeping purposes only and is not subject to current web standards.
A legacy report from the Bureau of Mineral Resources (BMR) committee outlining a forward marine program for Australia. The document, published by the Australian Ocean Data Network, is available in PDF and HTML formats. No abstract, sample data, or detailed metadata is available for this historical record.
A government report outlining a forward program for marine science by a BMR committee. The document is authored by the Australian Ocean Data Network and was last updated in April 2026. No information on the report's length or specific data volume is available.