Merged Phishing Email Dataset from Enron, Nazario, and SpamAssassin
Available on 1 platform
Sign in to view source links and access this dataset
Description
A merged collection of emails from three established sources: the Enron corpus, the Nazario phishing corpus, and the SpamAssassin public corpus. The dataset is hosted on Kaggle, but specific details like row count, file formats, and license are not provided in the metadata. Its content likely contains a mix of legitimate, spam, and phishing emails for analysis.
Use Cases
Train a binary classifier to detect phishing emails (inferred from domain, verify after download)
Benchmark natural language processing models for spam vs. ham classification (inferred from domain, verify after download)
Analyze linguistic features common to phishing attempts across different source corpora (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science.
Combines emails from three well-known, publicly referenced sources.
Limitations
Metadata is minimal; actual content requires verification after download.
Row count, column definitions, and license information are unknown.
Data may reflect temporal or source bias inherent to the original corpora.
Provenance
Source
Kaggle
Collection Method
Merged from the Enron, Nazario, and SpamAssassin email collections.
License is unknown; users must verify terms before use.