A merged collection of email data from three sources: Enron, Nazario, and SpamAssassin. The dataset likely contains emails labeled as phishing or spam, intended for security research. It is hosted on Kaggle, but specific details about its size, structure, and creation date are unknown.
Use Cases
- Train a binary classifier to identify phishing emails (inferred from domain, verify after download)
- Benchmark spam detection algorithms across different email corpora (inferred from domain, verify after download)
- Analyze linguistic patterns in malicious versus legitimate emails (inferred from domain, verify after download)
Strengths
- Published on Kaggle.
- Combines data from three established sources: Enron, Nazario, and SpamAssassin.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Row count, column definitions, and file formats are unknown.
- License and authorship information are unknown.
Provenance
- Collection Method
- Merged from Enron, Nazario, and SpamAssassin email datasets.