DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Software Engineering & Security Datasets | DataSalon

All Categories

🔒

Software Engineering & Security

Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples

1,591 datasets

XGBoost Test Case Pruning Presubmit Data

Kaggle hosts this dataset related to XGBoost, a popular machine learning library. The data likely contains records used for testing and pruning operations within a software development presubmit process. The specific content, scale, and origin require verification after download.

TabularMachine LearningTest CaseXgboostPruningSoftware Testing+1

0 views

Software Engineering & Security

Phishing Email Dataset

Phishing Email Dataset Simple is a dataset hosted on Kaggle. The title suggests it contains examples of phishing emails, likely for security analysis or machine learning tasks. No details on size, source, or creation date are provided.

TextCybersecurityText ClassificationPhishing+1

0 views

Software Engineering & Security

HTML, URL, and Metadata for Phishing Sites

A Kaggle dataset likely containing HTML content, URLs, and metadata for phishing websites. The specific number of rows, columns, and data collection method are unknown. The dataset's author, organization, and last update date are also unspecified.

TabularWeb SecurityCybersecurityPhishingHtml Analysis+1

0 views

Software Engineering & Security

Commit-Bench: Software Development Benchmark Dataset

Commit-Bench dataset likely contains metrics or records related to software development commits, sourced from Kaggle. The dataset's specific content, size, and authorship details are unknown from the provided metadata.

TabularSoftware EngineeringBenchmarkGit+1

0 views

Software Engineering & Security

CVEfixes_HF: Common Vulnerabilities and Exposures Fixes

A dataset likely containing information related to fixes for Common Vulnerabilities and Exposures (CVE). It is published on Kaggle. The specific content, size, and origin require verification after download.

TabularPatch AnalysisSoftware SecurityCveVulnerability Fixes+1

0 views

Software Engineering & Security

CVEfixes: C/C++ Vulnerability and Fix Code Pairs

CVEfixes is a dataset of C and C++ code pairs, likely linking vulnerable code snippets to their corresponding fixes. The dataset is hosted on Kaggle, but details on its size, creation date, and author are not provided in the metadata. Columns and specific content require verification after download.

TabularVulnerability PairsSoftware SecurityC CppCve+1

0 views

Software Engineering & Security

VICIdial Security Hardening Data

VICIdial/Asterisk data includes information on security hardening practices, Common Vulnerabilities and Exposures (CVEs), firewall configurations, and access control measures. The dataset appears to compile security-related information specific to the VICIdial and Asterisk telephony platforms. The author, organization, and temporal coverage are unknown.

TabularAccess ControlVoip SecurityFirewall RulesVulnerability Database+1

0 views

Software Engineering & Security

Phishing Website Features for Security Classification

PhishingWebsites is a dataset from OpenML containing features extracted from URLs and web pages for identifying phishing sites. It is used for training and benchmarking machine learning models in cybersecurity. The dataset's creator and specific temporal coverage are not provided.

TabularMachine LearningWeb SecurityCybersecurityPhishing Detection+1

0 views

Software Engineering & Security

Malware Datasets for Security Analysis

Malware Datasets is a collection hosted on Kaggle. The dataset's specific contents, scale, and features are not detailed in the available metadata. Its origin, creation date, and exact composition require verification after download.

TabularMachine LearningMalwareCybersecurity+1

0 views

Software Engineering & Security

Cybersecurity Crowdsourcing Data for Construction Robot Case Study

This dataset supports research on cybersecurity crowdsourcing, specifically the Hack My Robot case study in construction. The study was conducted by Muammer Semih Sonkor and Borja García de Soto.

Engineering+1

0 views

Software Engineering & Security

Federated Graph Neural-Based Zero-Day Malware Detection for Edge IoT

A dataset for research on zero-day malware detection in Edge IoT environments. The data likely contains graph-structured information for training federated graph neural network models. The dataset's author, organization, and temporal coverage are unknown.

GraphGraph Neural NetworksResearchMalware DetectionIot SecurityFederated Learning+1

0 views

Software Engineering & Security

OpenML Machine Learning Benchmark Dataset

OpenML dataset 'fictif20bkdkmcven7nov2025' is a machine learning benchmark for tabular data. It is part of the OpenML platform's collection of datasets for algorithm testing and comparison. The dataset's specific origin, size, and creation date are not detailed in the available metadata.

TabularMachine Learning BenchmarkTabular DataOpenmlOpenml Dataset+1

0 views

Software Engineering & Security

DAG4RE

Known as titled DAG4RE and is categorized under Computer and Information Science. It was last updated on February 24, 2026, and the author and organization are listed as anonymous.

Computer and Information Science+1

0 views

Software Engineering & Security

Malware Data for Security Analysis

A dataset named 'data_malware' sourced from Kaggle. The title suggests it contains information related to malicious software, likely for use in security or machine learning applications. No further metadata on size, columns, or origin is available.

TabularMachine LearningCybersecurityMalware Analysis+1

0 views

Software Engineering & Security

Phishing and Legitimate URL Features for Machine Learning

A large-scale dataset of phishing and legitimate URLs with engineered features for machine learning. The dataset is sourced from Kaggle, but the specific author, organization, and creation date are unknown. It is designed for binary classification tasks in cybersecurity.

TabularUrl FeaturesBinary ClassificationCyber SecurityLarge ScalePhishing DetectionData AnalyticsData Cleaning+1

0 views

Software Engineering & Security

Securecode Dataset: Code Security and Vulnerability Data

Securecode Dataset is a software security dataset published on HuggingFace by author rufimelo. The dataset was last updated on 2026-02-16. Its specific content and scale are not detailed in the available metadata.

TextVulnerability DetectionSoftware SecuritySecure Code+1

0 views

Software Engineering & Security

StreminiAI: 39,000 Labeled URLs for Phishing and Malware Detection

39,000 labeled URLs with extracted features for cybersecurity classification. The dataset is hosted on Kaggle and appears designed for machine learning tasks in threat detection. Its creation date, author, and specific feature details are not provided in the available metadata.

TabularUrl ClassificationCybersecurityPhishing DetectionMalware Detection+1

0 views

Software Engineering & Security

Hybrid Machine Learning Deep Learning IoT Network Traffic

Network traffic data for IoT intrusion and threat detection, designed for hybrid machine learning and deep learning approaches. The dataset supports classification tasks in the domain of mobile and wireless cyber security.

Mobile And WirelessExploratory Data AnalysisClassificationCyber SecurityDeep Learning+1

0 views

Software Engineering & Security

Final Phishing Dataset from Kaggle

Kaggle hosts a dataset titled 'final_phishing_dataset'. The dataset likely contains records related to phishing attempts, as suggested by its title and platform tags. The author, organization, and specific collection details are unknown.

TabularCybersecurityMalicious UrlsPhishingNetwork security+1

0 views

Software Engineering & Security

Code Dataset from Kaggle

A dataset related to code, sourced from the Kaggle platform. The specific content, size, and creation details are not provided in the available metadata. Further details such as the author, license, and last update date are unknown.

TextSoftware EngineeringProgramming+1

0 views

PreviousPage 54 of 80Next