DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Software Engineering & Security Datasets | DataSalon

All Categories

🔒

Software Engineering & Security

Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples

1,591 datasets

Phishing Dataset 2

Phishing Dataset 2 likely contains examples of phishing attempts for cybersecurity analysis. The dataset is published on Kaggle, but its specific contents, size, and authorship are unknown. Columns suggest it may include features related to email content, URLs, or network traffic.

TabularCybersecurityPhishingNetwork security+1

0 views

Software Engineering & Security

Cybersecurity Dataset for Threat Detection and Analysis

A dataset related to cybersecurity, likely containing information pertinent to network security or threat detection. It is hosted on Kaggle, but details about its creator, size, and specific contents are not provided. The dataset's age and update frequency are unknown.

TabularCybersecurityIntrusion DetectionNetwork security+1

0 views

Software Engineering & Security

LUFlow: Network Intrusion Detection Data Set

LUFlow is a novel data set for analyzing and detecting emerging threats in network traffic. The dataset likely contains features related to network flows and intrusion patterns. Its author, organization, and specific size are unknown.

TabularCybersecurityComputer ScienceInternetResearchOutlier AnalysisSoftwareIntrusion DetectionNetwork security+1

0 views

Software Engineering & Security

TACO Verified: Programming Solutions Passing All Test Cases

A filtered subset of the TACO dataset, last updated in April 2025, containing only verified programming solutions that pass all test cases. The dataset, created by author likaixin, includes 12,898 problems and 1,043,251 solutions, with a 71.03% correct ratio after removing failing solutions and problems with no correct answer.

TextBenchmarkCode GenerationSoftware TestingProgramming+1

0 views

Software Engineering & Security

Phishing Websites Dataset For Security Classification

Phishing websites data is a collection of features used to distinguish legitimate and fraudulent web pages. The dataset is hosted on the UCI Machine Learning Repository, a known source for benchmark datasets. The original creator and specific collection date are not provided.

TabularMachine LearningWeb SecurityCybersecurityPhishing Detection+1

0 views

Software Engineering & Security

Website Phishing Features for Detection Models

Website Phishing is a classic dataset from the UCI Machine Learning Repository for training classifiers to identify fraudulent websites. It contains labeled examples of phishing and legitimate sites, characterized by features like URL structure, page content, and security indicators. The dataset is a foundational resource for academic and industrial research in cybersecurity.

TabularMachine LearningCybersecurityPhishing DetectionWebsite Security+1

0 views

Software Engineering & Security

Bitcoin Heist Ransomware Address Dataset

A dataset from the UCI Machine Learning Repository containing information on Bitcoin addresses associated with ransomware activity. It is used for network analysis and security research, focusing on illicit transactions within the cryptocurrency ecosystem.

TabularCryptocurrencyBlockchain SecurityRansomwareBitcoinNetwork Analysis+1

0 views

Software Engineering & Security

Malware Feature Set From VxHeaven and VirusTotal

A collection of malware samples with extracted static and dynamic features, sourced from the VxHeaven repository and VirusTotal. The dataset is intended for cybersecurity research and machine learning model development. The original compilation and feature extraction were performed by contributors to the UCI Machine Learning Repository.

TabularCybersecurityMalware AnalysisStatic FeaturesVxheavenDynamic Features+1

0 views

Software Engineering & Security

Malware Type Detection Dataset

A dataset for malware classification tasks, sourced from the UCI Machine Learning Repository. It is intended for building and evaluating models that can identify different types of malicious software. The specific temporal coverage and collection method are not detailed.

TabularMachine LearningCybersecurityTabular DataUci RepositoryMalware Classification+1

0 views

Software Engineering & Security

Intrusion Detection Data for Wireless Sensor Networks

LT-FS-ID is a dataset from the UCI Machine Learning Repository for intrusion detection in Wireless Sensor Networks (WSNs). It contains labeled network traffic data for training and evaluating machine learning models to identify security attacks. The dataset's creator and specific collection date are not provided.

TabularMachine LearningWireless Sensor NetworksIntrusion DetectionNetwork security+1

0 views

Software Engineering & Security

Tezpur University Android Malware Detection Dataset

TUANDROMD is a dataset for Android malware detection created by Tezpur University. It contains labeled samples for training and evaluating machine learning models in cybersecurity. The dataset's specific size and update date are not provided.

TabularMachine LearningCybersecurityAndroid MalwareTabular Data+1

0 views

Software Engineering & Security

Shell Commands from Cybersecurity Training Participants

UCI Machine Learning Repository hosts a dataset of shell commands executed by participants during hands-on cybersecurity training exercises. The data is tabular and captures user behavior in command-line environments. The original creator and specific collection timeframe are not documented.

TabularShell CommandsUser BehaviorCommand LineCybersecurity Training+1

0 views

Software Engineering & Security

Phishing URL Dataset for Web Security Classification

PhiUSIIL Phishing URL (Website) is a collection of URLs labeled for phishing detection. The dataset originates from the UCI Machine Learning Repository and is tagged for URL classification and web security. Specific details on its size, creation date, and author are not provided.

TabularUrl ClassificationWeb SecurityCybersecurityPhishing Detection+1

0 views

Software Engineering & Security

Source Code for Permeability Characterization in Enhanced Geothermal Reservoirs

Featuring source code for assimilating microseismic and tracer data to characterize three-dimensional permeability in enhanced geothermal reservoirs. It was authored by Jiang, Zhenjiao and last updated in February 2026.

Earth and Environmental Sciences+1

0 views

Software Engineering & Security

Cyber Security Attacks

40,000 records documenting cyber security attacks across 25 distinct metrics. The data provides a structured overview of security incidents and their associated technical parameters.

TabularComputer ScienceStandardized TestingGenerative Adversarial NetworkSurvey Analysis+1

0 views

Software Engineering & Security

Cybersecurity Conversations in Italian from ShareGPT

A text dataset of cybersecurity-related conversations in Italian, sourced from ShareGPT. The dataset was uploaded by author Mattimax to the Hugging Face platform and was last updated on 2026-02-09. The specific content, scale, and structure require verification after download.

TextItalianCybersecurityConversational DataSharegpt+1

0 views

Software Engineering & Security

Construction Cybersecurity Crowdsourcing Platform Case Study Data

This dataset originates from a crowdsourcing platform case study conducted by Muammer Semih Sonkor and Borja García de Soto. It is hosted by Harvard Dataverse and was last updated in January 2026. The specific data volume, structure, and features are not detailed in the provided input.

Engineering+1

0 views

Software Engineering & Security

AI AppSec Index: AI Security Remediation Benchmarks and Compliance Matrices

AI AppSec Index provides data related to AI application security. The description mentions AI remediation benchmarks, ASPM matrices, CVE mappings, and CRA compliance. It is hosted on Kaggle, but the author, organization, and specific data volume are unknown.

TabularComplianceBenchmarkArtificial IntelligenceClassificationSoftwareCyber SecurityAi SecuritySoftware BenchmarksVulnerability Mapping+1

0 views

Software Engineering & Security

Eseur: Empirical Software Engineering Data for Evidence-Based Analysis

This repository contains the empirical data and source code used to generate examples for the book "Evidence-based Software Engineering based on the publicly available data" by Derek Jones. Updated in February 2026, the collection aggregates diverse datasets spanning software evolution, developer psychology, and economic modeling. It serves as a foundational resource for reproducing statistical analyses in empirical software engineering.

Empirical Software EngineeringSource Code AnalysisSoftware EngineeringEcosystem ModelingEvidence BasedSystem EvolutionCognitive ScienceCareer PathsData AnalysisSoftware DevelopmentCognitive CapitalismPsychology ExperimentsEconomic Models+1

0 views

Software Engineering & Security

Source code vulnerability

A collection of labeled source code snippets across 3+ programming languages including C++, Java, and Python. The dataset categorizes code blocks as either vulnerable or secure to facilitate training for automated security auditing.

EnglishTextComputer ScienceBinary ClassificationBeginner+1

0 views

PreviousPage 55 of 80Next