Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
CVEbinaryClassification is a dataset from Kaggle for binary classification tasks. The title suggests it contains data related to Common Vulnerabilities and Exposures (CVE) entries, likely with labels indicating a security status. The dataset's specific size, features, and origin are not detailed in the provided metadata.
A mapping dataset linking cybersecurity playbooks to MITRE D3FEND defensive techniques. The dataset's specific scope and volume are not detailed in the provided information. Author, organization, and last update details are unknown.
Security data for the VICIdial and Asterisk telephony systems, focusing on hardening measures. The dataset includes Common Vulnerabilities and Exposures (CVEs) and firewall configurations. Author, organization, size, and last update details are not provided.
A landmark report published by UNFPA and HelpAge International analyzes global population ageing trends. The report projects that the number of people over age 60 will surpass one billion within the next 10 years. It argues for a concerted global effort to align 21st-century society with these demographic realities.
DynamicVerse provides processed datasets for dynamic scene understanding and 4D reconstruction, integrating outputs from advanced visual models like Sa2VA, Qwen-VL, DAM, CameraBench, CoTracker, and UniDepth. The framework supports end-to-end processing from video input to 4D scene models, covering multiple mainstream dynamic scene categories.
Phishing Dataset for URL Classification is a Kaggle-hosted resource for machine learning in cybersecurity. The dataset's specific size, features, and collection methodology are not detailed in the provided metadata. Its primary purpose appears to be training models to distinguish malicious URLs from legitimate ones.
Aggregating firewall metrics used to train and test unsupervised machine learning-based Intrusion Detection Systems for detecting malicious active scans on corporate networks. It was created by Matana da Rocha, Paulo and used in the associated research paper, with a last update recorded in February 2026.
Terminal-Bench Pro is a benchmark dataset for evaluating AI agents on terminal-based tasks. It contains 400 tasks across eight domains, including data processing, games, debugging, and machine learning, derived from real-world scenarios and GitHub issues. The dataset was created by alibabagroup and last updated on January 5, 2026.
Crisis and Commitment: United States Policy toward Taiwan, 1950-1955 is a dataset hosted on paperswithcode. The title suggests it contains textual materials related to a specific historical period of diplomatic relations. The dataset's license is listed as closed, and its author and organization are unknown.
The 2010 Interagency Autism Coordinating Committee Strategic Plan for Autism Spectrum Disorder Research is a document outlining research priorities and objectives. It was published on the paperswithcode platform. The dataset likely contains the text of the strategic plan, which may include goals, recommendations, and research areas.
International Paralympic Committee (IPC) data is listed on paperswithcode. The dataset likely contains information related to Paralympic sports and events. Its specific content, size, and structure are unknown from the provided metadata.
Malware Durjoy is a dataset published on Kaggle. Its specific content, size, and origin are not detailed in the provided metadata. The dataset likely contains information related to malware analysis or cybersecurity threats.
Report of the Dietary Guidelines Advisory Committee on the Dietary Guidelines for Americans, 2000, published on paperswithcode. The document likely contains scientific reviews, nutritional recommendations, and policy advice. Its specific content, structure, and authorship details require verification after download.
A 2007 status report reviewing developments against data since the 2003 Programme for Action publication. It considers progress against a Public Service Agreement target, national headline indicators, and government commitments. The report highlights the challenging nature of the health inequalities PSA target for 2010.
The seventh report of the Joint National Committee on prevention, detection, evaluation, and treatment of high blood pressure. The document lists the committee members, including Aram V. Chobanian, George L. Bakris, and others from the National High Blood Pressure Education Program Coordinating Committee. The platform 'paperswithcode' indicates it is likely a text document related to medical and engineering research.
Openclaw Opencode Dataset is a collection of open source code, likely focused on text processing and software development. The dataset was created by user awax1122 and was last updated on Hugging Face in February 2026. Its specific size and scope are not detailed in the available metadata.
Funsqlqueries is a dataset for training models on SQL queries, created by JuliaHealthOrg. The dataset is in a development phase, with a small subset available for training as of its last update on 2026-02-04. The maintainers are actively working to complete the dataset and are inviting contributions from other developers.
cybersecurity_attacks is a dataset hosted on Kaggle. The dataset likely contains records of various cyber threats or network intrusion attempts. Its specific content, size, and origin require verification after download.
Malware dataset values is a dataset hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its actual structure and features require verification after download.
A dataset titled 'Kids Dataset' published on Kaggle. The specific content, size, and origin are not detailed in the provided metadata. Columns and sample data are unknown, requiring download for verification.