Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,561 datasets
Net bilateral aid flows from Development Assistance Committee (DAC) donors to Hungary, measured in current U.S. dollars. The data covers disbursements of official development assistance (ODA) and official aid, defined as grants and concessional loans minus principal repayments. It is compiled by the World Bank's World Development Indicators from DAC member reports, with official aid data collection ending in 2004.
World Bank data on net bilateral aid flows from Development Assistance Committee (DAC) donors and European Union institutions. It records net disbursements of official development assistance and official aid, measured in current U.S. dollars. The data is compiled by the World Bank's World Development Indicators team.
Parallel implementations of Boolean and arithmetic functions in Lean source code. The dataset is hosted on Kaggle, but the author, organization, and last update date are unknown. The description suggests it contains formal verification code, but the exact number of files, rows, and specific file formats are not provided.
A database reporting discrete capital investments from New York City's Capital Commitment Plan, uniquely identified by Financial Management Service (FMS) ID. The data is hosted by data.cityofnewyork.us and was last updated on 2026-01-26. Each row contains information on the sponsoring and managing agency for a project.
A synthetic cybersecurity dataset designed to represent realistic attack scenarios. The dataset was created by Incribo and is hosted on Kaggle under a Kaggle license, with author attribution to Uma venugopal. It is intended as a sample for analytical tasks related to attack signatures and heatmaps.
A collection of merged GitHub Pull Requests, code patches, and human review comments for benchmarking AI systems. It is derived from publicly available open-source repositories under permissive licenses. The specific row count, column count, and size are unknown.
Incribo's synthetic dataset provides a realistic representation of cybersecurity attack patterns. The data is designed as an ideal playground for analytical tasks, including assessing heatmaps and attack signatures. Authored by Uma venugopal and hosted on Kaggle, this sample dataset's full scope and update history are not specified.
GVU's WWW User Survey data from Georgia Tech provides a historical snapshot of early web user behavior. The dataset was collected through multiple surveys conducted between 1994 and 1998. It was made available for specialized analysis beyond the original project's scope.
xamxte's dataset maps Common Vulnerabilities and Exposures (CVE) descriptions to Common Weakness Enumeration (CWE) categories and MITRE ATT&CK techniques. It is built from the National Vulnerability Database (NVD) and uses AI-assisted label refinement. The dataset was last updated in March 2026.
Trevor R. Reese's historical analysis, 'Crises and Commitments', details Australia's political and diplomatic involvement in Southeast Asian conflicts from 1948 to 1965. The work covers events from the Malayan Insurrection through commitments in Vietnam, sourced from the paperswithcode platform. The dataset is a textual historical account with a closed license.
Ocean polygon data defines the salt water bodies and municipal jurisdiction extent from the Mount Desert Island shoreline. The dataset serves as a digital base map layer, representing a widely accepted delineation of ocean area around MDI. It was created by Gordon Longsworth of the College of the Atlantic, with metadata formatted for NASA by Cheryl Solomon.
A synthetic dataset for analyzing password complexity and security strength. The dataset is hosted on Kaggle, but specific details about its size, authorship, and creation date are not provided. Its primary purpose is to support analysis of password security features.
SWE-PRBench contains 350 pull requests annotated with human reviewer feedback to evaluate AI code review quality. Created by foundry-ai, this benchmark measures if large language models can identify the same issues flagged by human reviewers in production code changes.
HPC Numerical C Cpp 500M JSONL is a collection of high-performance numerical source code in C and C++ intended for pretraining models. The dataset is hosted on Kaggle, but the author, organization, and specific collection details are not provided. The description indicates it is designed for pretraining, but the exact size, structure, and licensing terms are unknown.
Place name information for the Australian Antarctic Territory and the Territory of Heard Island and McDonald Islands is maintained by the Australian Antarctic Data Centre and the Secretary of the Australian Antarctic Division Place Names Committee. The gazetteer includes descriptive narratives, images, source information, and altitude data where available. Users can search by place name, region, feature type, latitude, or longitude.
ASOS Surface System Log (SYSLOG) is a digital data set of electronic system messages and error codes from the Automated Surface Observing System. The log is generated by continuous system self-tests and includes station identification, timestamps, message codes, and remarks. It is archived at the National Climatic Data Center under DSI-6402 and originates from NOAA NCEI.
LCM2007 provides a parcel-based thematic classification of satellite imagery covering the entire United Kingdom, updating earlier 1990 and 2000 maps. This version is a 1km raster dataset showing percentage aggregate land cover classes for Northern Ireland only. It uses the Joint Nature Conservation Committee Broad Habitats nomenclature, incorporating further detail from ancillary data sources.
81 intellectually gifted women, including homemakers and professionals, were surveyed in 1969 and 1970. The data were collected by Judith Birnbaum using a 41-page mailed questionnaire covering early experiences, activities, attitudes, and values. The Murray Research Archive holds the numeric and original paper data from this study.
Remarks presented by the Canadian Institutes of Health Research (CIHR) to the Standing Committee on Science and Research (SRSR). The document is a formal testimony, available in PDF format. It was last updated in February 2026.
December 2024 briefing package prepared for the Chair of the Transportation Safety Board of Canada. It contains prepared materials for an appearance before the Standing Committee on Transport, Infrastructure and Communities. The document was published by the Transportation Safety Board of Canada.