Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,561 datasets
A collection of AI-generated text containing explicit, toxic, and dangerous content for research purposes. Compiled by BlackTechX011, it is intended for red-teaming, alignment research, and defensive cybersecurity evaluation. The dataset includes content detailing violence, psychological harm, cyber-attacks, and illegal activities.
Cybersecurity Threat Dataset for Malware Detection contains 5,000 labeled cyber threat intelligence records. The records include Indicators of Compromise (IOCs) intended for machine learning applications. The author, organization, and last update date are unknown.
Canadian Radio-television and Telecommunications Commission (CRTC) testimony before the Standing Committee on Canadian Heritage regarding Bill C-11. The HTML document presents official information submitted to Parliament for the study of amendments to the Broadcasting Act. The data was last updated on March 11, 2026.
Canadian Radio-television and Telecommunications Commission (CRTC) presented information to Parliament regarding the Study of Bill C-11, An Act to amend the Broadcasting Act and to make related and consequential amendments to other Acts. The testimony was delivered before the Senate Standing Committee on Transport and Communications. The data is an official record of the CRTC's position and analysis on proposed legislative changes.
Source code for the Multiscale Fractures Integrated Equivalent Porous Media (MFEPM) method, authored by Ma, Lei and hosted on Harvard Dataverse. The code is designed for simulating flow and solute transport in multiscale fractured media. The repository was last updated on 2026-05-06.
Canadian Radio-television and Telecommunications Commission (CRTC) testimony before the Standing Committee on Canadian Heritage regarding Bill C-18. The presentation covers the study of online communications platforms making news content available in Canada. The dataset is an HTML document last updated on March 11, 2026.
Arrest incidents in the City of Los Angeles from 2010 to 2019, transcribed from original paper reports by the Los Angeles Police Department (LAPD). The dataset includes 31 columns such as Arrest Date, Charge Description, Location, and demographic codes. It is published on the Los Angeles Open Data Portal and was last updated on 2026-01-02.
Canadian Radio-television and Telecommunications Commission (CRTC) testimony before the Senate Standing Committee on Transport and Communications regarding Bill C-11. The presentation covers proposed amendments to the Broadcasting Act and related consequential changes. The dataset consists of official HTML documentation submitted to Parliament.
Phishing Email Detection Dataset is a collection of emails intended for use in phishing detection, natural language processing, and classification tasks. The dataset was sourced from Kaggle, but its author, organization, and creation date are unknown. The specific number of records, file formats, and column-level details are not provided in the available metadata.
London's affordable housing statistics for programmes funded by the Greater London Authority (GLA). The data reflects the GLA's commitment to open and transparent reporting on housing delivery. Statistics for the rest of England are published separately by Homes England.
Containing 5,215 rows of LeetCode Python solutions, merged from two open-source sources. It is structured as single-turn chat conversations for training and validation, with 4,488 rows for training, 499 for validation, and 228 for testing. The data was created by justindal and involves field normalization and solution validation.
35 Coast Guard officers were interviewed across eighteen fisheries harbours in Sri Lanka. The dataset contains qualitative interview data exploring factors shaping officers' self-legitimacy, work conditions, and enforcement challenges. The study was authored by Gule Godage, Lasuni and published in March 2026.
Replication data from a study on how favorable intergroup comparisons affect confrontation with a group's harmful history. The research focuses on the UK and examines psychological defenses like 'whataboutism' in response to historical identity threats. The dataset was authored by Joe Kendall and last updated on March 16, 2026.
86 open-source GitHub repositories provide metrics for predicting defects in Ansible Infrastructure-as-Code scripts. The dataset includes IaC-oriented, delta, and process metrics, curated from repositories meeting criteria like recent activity and continuous integration. It aims to identify source code and development process properties that are good predictors of defects in IaC.
MPII-Human-Pose-Data is a collection of human pose annotations transformed into a CSV file. The data originates from the MPII Human Pose dataset, which is a benchmark for articulated human pose estimation. The copyright is held by the Max Planck Institute for Informatics, and the data is shared under a permissive software license.
A 9.5 KB Excel file provides an overview of intrusion detection techniques for Controller Area Networks (CAN). The dataset was authored by Hareem Kibriya and last updated on March 19, 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
Sophos AI provides a dataset of approximately 20 million Portable Executable files, split evenly between benign and malicious samples. It includes EMBER-v2 features, metadata from the pefile library, ReversingLabs detection telemetry, and behavioral tags for malware samples. The dataset is hosted on AWS Open Data and is intended to support machine learning research for malware detection.
The U.S. Historical Climatology Network Monthly Data, Version 2.5 consists of precipitation and temperature data corrected for changes in station location, instrumentation, and observing practices. The data includes sets of Maximum, Minimum, and Average Temperature and Precipitation that are raw, adjusted for time-of-observation bias, or processed through the Pairwise Homogenization Algorithm. It is archived with station information and source code for reading the data.
A briefing package prepared for the Chair of the Transportation Safety Board of Canada for an appearance before the Standing Committee on Fisheries and Oceans in April 2022. The document is available in DOCX and HTML formats and was last updated in March 2026.
Transportation Safety Board of Canada briefing package prepared for the Chair's appearance before the Standing Committee on Transport, Infrastructure and Communities in February 2022. The document collection includes materials in DOCX, HTML, PPT, and PDF formats.