A synthetic dataset containing examples of phishing and legitimate emails. The dataset was sourced from Kaggle. Its specific size, creation date, and author are unknown.
Use Cases
- Train binary classifiers to distinguish phishing from legitimate emails based on the described email content.
- Benchmark machine learning models for cybersecurity threat detection using the described synthetic email examples.
- Analyze feature patterns common to phishing attempts based on the described email characteristics.
Strengths
- Focuses on a specific and high-impact cybersecurity task: email phishing detection.
- Contains a labeled binary classification target (phishing vs. legitimate) as described.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Kaggle
- Collection Method
- Synthetically generated.