111,000 URLs labeled for phishing detection, each characterized by 22 distinct features. The dataset was published on Kaggle in September 2025 and is described as real-world data.
Use Cases
- Train a binary classifier to predict phishing status using the 22 provided URL features.
- Analyze feature importance among the 22 characteristics to identify key indicators of malicious URLs.
- Benchmark machine learning models like Random Forest or Gradient Boosting on this 111K-row tabular dataset.
- Develop feature engineering pipelines for cybersecurity applications based on the 22 URL attributes.
Strengths
- 111,000 URL instances provide a substantial sample for model training.
- 22 features per URL offer multiple dimensions for analysis and prediction.
Limitations
- Unknown class balance may indicate significant bias between phishing and benign URL classes.
- The origin and collection method of the 'real-world' URLs are unspecified, limiting reproducibility.
Provenance
- Source
- Kaggle
- Collection Method
- null
- Time Range
- null
- Freshness
- Published September 2025.
- Geography
- null