Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,591 datasets
Peddlers of Crisis: The Committee on the Present Danger and the Politics of Containment is a dataset from the paperswithcode platform. The title suggests it likely contains textual materials related to the Committee on the Present Danger, a U.S. foreign policy advocacy group. The dataset was authored by J. W. Sanders.
Cybersecurity data likely contains information related to cryptographic hashes and network protocols. Published on Kaggle, its specific content, size, and authorship require verification. The dataset's focus on SHA-256, AES, and TOR suggests a collection for security analysis.
Malware-Utils-v7 is a dataset hosted on Kaggle. Its title suggests it contains utilities or tools related to malware analysis. The dataset's specific content, size, and structure are not described in the available metadata.
A test extract from a malware classification dataset, version 5, published on Kaggle. The dataset likely contains features for distinguishing between different types of malicious software. Metadata regarding size, columns, and origin is minimal.
Steven C. Hayes authored this dataset on Acceptance and Commitment Therapy. The dataset likely contains research papers or materials related to this psychological intervention. Its content and scope require verification after download.
Karen R. Broder authored a document titled 'Preventing Tetanus, Diphtheria, and Pertussis Among Adolescents: Use of Tetanus Toxoid, Reduced Diphtheria Toxoid and Acellular Pertussis Vaccines: Recommendations of the Advisory Committee on Immunization Practices (ACIP)'. The document is hosted on the paperswithcode platform. The content likely contains formal public health guidelines and supporting information.
Julia Shaw published research on constructing false memories of committing crime. The dataset likely contains experimental data from psychology and criminology studies. It is hosted on the paperswithcode platform.
Acceptance and commitment therapy dataset published on paperswithcode. The dataset is authored by Steven C. Hayes. The specific content, size, and temporal coverage are not detailed in the provided metadata.
Modernization theory, a key intellectual framework during the Cold War, is examined through its rise and fall in American academia. The description references specific institutions like the Harvard Department of Social Relations and the MIT Center for International Studies. The dataset's content likely consists of academic text discussing the theory's development, contestation, and eventual collapse.
Ransomware Dataset (Access Controlled) is a cybersecurity dataset hosted on Kaggle. The dataset likely contains information related to ransomware attacks or malware analysis. Its specific content, size, and provenance are unknown due to minimal metadata.
Kaggle hosts a dataset titled AutoGit_Commit_Dataset.csv. The dataset likely contains records of Git commit activity. Its author, organization, and specific content details are unknown.
Kaggle hosts a dataset titled 'Malware-Utils-v7'. The dataset likely contains information related to malware utilities. Its author, organization, and specific content details are unknown.
December 2025 collection of 22 complete Claude Code session recordings. The dataset contains 17,487 conversation turns, 50 hours of runtime, and over 13 million total tokens. It was created by the author 'novita' to validate Suffix Decoding's applicability in Agentic Coding scenarios.
40 million GitHub repository records aggregated from GH Archive public event streams by ibragim-bad. The dataset provides per-repository statistics including stars, forks, and pull requests as of early 2026. It is formatted for large-scale analysis using tools like Polars and Dask.
A synthetic dataset of competitive coding-style Python problems and their corresponding unit test cases. The dataset was created by NVIDIA for its NeMo Gym reinforcement learning framework and was last updated in January 2026. It aggregates questions and tests from the CodeContests and Open-R1 collections.
The dataset likely contains mappings between Common Vulnerabilities and Exposures (CVEs) and corresponding MITRE D3FEND defensive techniques and tactics. It is sourced from the MITRE D3FEND knowledge base, a framework for cybersecurity countermeasures. The dataset's author, organization, and specific size are unknown.
Over 286,000 Common Vulnerabilities and Exposures (CVE) entries are linked to predicted MITRE ATT&CK Tactics, Techniques, and Procedures (TTPs) using the SMET model. This dataset appears to be a large-scale mapping of software vulnerabilities to adversary behavior patterns. The original author, platform, and specific creation date are not provided in the metadata.
50,000 clean, flattened vulnerability records from the National Vulnerability Database for the year 2025. The data originates from the NVD, a U.S. government repository of standards-based vulnerability management data. It provides a snapshot of disclosed software security flaws from that specific year.
Gitbugactions is a framework and collection of executable code benchmarks for program repair, maintained by the gitbugactions organization and updated as of March 2026. It facilitates the automated collection of datasets by leveraging GitHub Actions to identify and capture reproducible bug-fix instances in Python repositories.
NIST SRD 23 contains thermodynamic and transport property data for 105 pure fluids, including environmentally acceptable HFCs, traditional CFCs, and natural refrigerants like ammonia. The database allows for mixtures of up to 20 components and provides Fortran subroutines, data files, and a sample Excel spreadsheet for integration. Version 9.1 has replaced the older NIST 12 and 14 databases.