Zero-Day Phishing: Labeled Post-Rendered HTML Pages
Available on 1 platform
Sign in to view source links and access this dataset
Description
Labeled HTML pages categorized as benign or phishing for training cybersecurity machine learning models. The dataset is hosted on Kaggle, but the author, organization, and creation date are unspecified. The total number of pages, file formats, and specific features are unknown.
Use Cases
Train binary classifiers to distinguish phishing from benign websites based on HTML content.
Develop feature extraction pipelines for post-rendered web page analysis.
Benchmark model performance on a labeled corpus of HTML pages for cybersecurity applications.
Strengths
Data is explicitly labeled for a binary classification task (benign vs. phishing).
Focuses on post-rendered HTML, which likely captures the final content a user sees.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Last update date is unknown; freshness unverified.
Provenance
Source
Kaggle
License is unknown; users must verify permissions before use.