Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,561 datasets
A 2003 spectral library of aquatic substrates from Adelaide coastal waters, hosted in the National Spectral Database. The dataset was referenced in a technical report prepared for the Adelaide Coastal Waters Study Steering Committee in July 2007. Data access is managed through the Australian Ocean Data Network.
Seven police districts in Washington, DC, each divided into three sectors and between seven and nine Police Service Areas (PSAs). The dataset is provided by the District of Columbia's Metropolitan Police Department and was last updated on March 11, 2026. It describes the geographic organization of patrol services and the Special Liaison Branch's community policing model.
Washington, DC, has seven police districts, each divided into three sectors and further into Police Service Areas (PSAs). The dataset, provided by the District of Columbia, describes this geographic and administrative structure for the Metropolitan Police Department's patrol and liaison services. It was last updated on March 11, 2026.
A 2017 publication from Global Affairs Canada sets out the Vancouver Principles, a set of political commitments endorsed by UN Member States. The document provides practical guidance for planning, training, and conducting peacekeeping operations with a focus on preventing the recruitment and use of child soldiers. It is archived and should be referenced for research or recordkeeping purposes only.
A collection of cybersecurity data from 1999 to 2025, including approximately 300,000 CVE records. The dataset aggregates disclosed reports from HackerOne, exploits from ExploitDB, red team prompts, and structured threat intelligence from MITRE ATT&CK. It was created by Zain Ali and last updated on HuggingFace in May 2026.
A 2021 dataset curated for the TabArena Tabular ML IID Study, intended for evaluating classification models on independent and identically distributed data. It originates from academic research on Android malware detection using native and custom permission features. The TabArena team removed a high proportion of duplicate rows present in the original data.
Department for Infrastructure (DfI) payments exceeding £25,000 made during the 2024/25 financial year. The dataset is updated monthly as part of the Northern Ireland Civil Service (NICS) commitment to expenditure transparency. It likely contains details on suppliers, transaction amounts, and payment dates.
Ukraine's Antimonopoly Committee maintains a register of its open data sets available on the Unified State Open Data Web Portal. The dataset is provided in an Excel XLSX format and was last updated on May 4, 2026. It is published by the States site of Ukraine.
99,870 high-quality system, user, and assistant triples form a ready-to-train dataset for cybersecurity instruction tuning. Created by Alican Kiraz and last updated on April 22, 2026, it is licensed under Apache-2.0 for production use. The dataset's scope includes OWASP Top 10, MITRE ATT&CK, NIST CSF, CIS Controls, ASD Essential 8, modern authentication, SSL/TLS, cloud security, DevSecOps, cryptography, and AI security.
56 total instances of agent performance on a software engineering benchmark. The dataset, created by user zvzv1919, was last updated on 2026-04-17. It likely contains metrics from evaluating an agent's ability to locate and match code functions and files.
200 instances comprise this benchmark for evaluating software engineering agents, with 95% file match and 89% function match rates. The dataset was created by author zvzv1919 and last updated on 2026-04 17. It appears to be part of a collection for testing agents on tasks like code location and function matching.
Townsville City Council provides data on service requests submitted by the public through the Snap Send Solve mobile application. The dataset includes the number of requests, reported dates, and incident types for issues like graffiti, potholes, and damaged playground equipment. It is published by the Townsville City Council under a CC-BY-4.0 license and was last updated in March 2026.
Uman, Ukraine's local government plan for preparing regulatory acts, published on the States site of Ukraine. The dataset is available in Excel XLSX format and was last updated on 2026-05-04. It likely contains schedules and topics for future municipal legislation.
Named results of voting by members of the executive committee at meetings of the Pryluky City Council in Ukraine. The dataset is hosted on the States site of Ukraine and was last updated on 2026-04-30. It is available for download in EXCEL XLS and XLSX formats.
9.5 KB Excel file listing recent state-of-the-art Intrusion Detection System approaches. The dataset was authored by Shailendra Mishra and last updated on April 20, 2026. Its small size suggests it is likely a curated list or summary rather than a large-scale experimental dataset.
Edmonton's data portal provides details for City Council and select Committee meetings during the 2025-2029 council term. The dataset includes meeting times, locations, types, and links to official documents like agendas and minutes. It was last updated by data.edmonton.ca in April 2026.
DataBoundary is a red/blue team benchmark dataset evaluating delimiter-based defenses against prompt injection attacks. It contains 5,578 test cases, testing 3 attack templates and 7 injection payloads across 13 LLMs, including cloud APIs and locally-hosted models. The dataset was created by Alan-StratCraftsAI and was last updated on 2026-05-05.
A briefing package prepared for the Minister of Natural Resources Canada for an appearance before the House of Commons Standing Committee on Natural Resources. The document was created for the committee's study of the 2024-25 Supplementary Estimates (B) and is dated November 27, 2024. It is published by Natural Resources Canada under the OGL-CA-2.0 license and was last updated on the platform on April 9, 2026.
The Aquatic Substrate Library contains spectral data for marine and coastal features from the Adelaide Coastal Waters Study. It is part of the Australian National Spectral Database, sourced from a 2007 technical report by Blackburn Environmental Pty Ltd and CSIRO Land and Water. This dataset supports remote sensing analysis of natural and anthropogenic changes in aquatic environments.
A curated subset of 611 agentic traces from the badlogicgames/pi-mono dataset, processed through the Talos trace curation pipeline. The dataset was created by DJLougen and last updated on April 23, 2026. Original data was in a file-per-session JSONL format with Anthropic API-style message blocks.