Email Dataset for Phishing Detection and NLP Classification
Available on 1 platform
Sign in to view source links and access this dataset
Description
Phishing Email Detection Dataset is a collection of emails intended for use in phishing detection, natural language processing, and classification tasks. The dataset was sourced from Kaggle, but its author, organization, and creation date are unknown. The specific number of records, file formats, and column-level details are not provided in the available metadata.
Use Cases
Train phishing detection classifiers based on email text content.
Benchmark NLP models for binary classification of malicious versus legitimate emails.
Analyze linguistic features common to phishing attempts mentioned in the description.
Strengths
The dataset is explicitly designed for the concrete task of phishing detection.
It is described as suitable for both NLP and classification, indicating a dual-purpose structure.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Kaggle
License is unknown; users must verify terms before use.