A Kaggle dataset likely containing HTML content, URLs, and metadata for phishing websites. The specific number of rows, columns, and data collection method are unknown. The dataset's author, organization, and last update date are also unspecified.
Use Cases
- Train a classifier to distinguish phishing sites from legitimate ones (inferred from domain, verify after download)
- Analyze HTML structure patterns common in phishing attacks (inferred from domain, verify after download)
- Build a feature extraction pipeline for URL and metadata-based security tools (inferred from domain, verify after download)
Strengths
- Published on Kaggle, a platform for sharing datasets.
- Focuses on phishing sites, a relevant topic for cybersecurity.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Row count, column definitions, and file formats are unknown.
- Data may reflect temporal or source bias inherent to Kaggle.