URL strings and classification labels corrected using a verified reference dataset. This data provides a high-fidelity ground truth for distinguishing between phishing and legitimate web addresses by resolving label noise.
Use Cases
- Train a binary classifier using the url and label columns.
- Validate the performance of URL-based security filters against verified ground-truth labels.
- Quantify the effect of label noise in cybersecurity datasets by comparing these corrected entries to original sources.
- Perform feature engineering on URL strings to identify patterns associated with phishing.
Strengths
- Contains URL strings and classification labels.
- Labels are corrected using a verified reference dataset.
- Provides a ground-truth source for phishing vs. legitimate URL classification.
- Focuses on resolving label noise in cybersecurity datasets.