Two categories of URL strings, malicious and benign, facilitate the detection of harmful online content. The data supports binary classification tasks for web security applications by providing labeled web addresses.
Use Cases
- Train a binary classifier to identify harmful links using the URL text and associated labels
- Perform feature engineering on URL strings to extract domain-level characteristics for threat detection
- Benchmark URL filtering algorithms against a labeled set of malicious and benign web addresses
Strengths
- Includes URL strings representing both malicious and benign web addresses
- Features binary labels for classification tasks
- Provides raw text data in URL format