Bangla-language phishing data for machine learning and cybersecurity research. The dataset contains simulated phishing content across SMS, email, and URL formats. It was sourced from Kaggle, but details on the author, organization, and last update are unavailable.
Use Cases
- Train text classification models to detect phishing based on simulated SMS content.
- Develop natural language processing tools for Bengali cybersecurity based on simulated email text.
- Build URL phishing classifiers using simulated Bangla URL data mentioned in the description.
- Benchmark multilingual phishing detection algorithms using a Bengali-language corpus.
Strengths
- Data is specifically designed for the Bengali language, a less common focus in phishing research.
- Covers multiple phishing vectors (SMS, email, URL) as described.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- Kaggle
- Collection Method
- Simulated data, as stated in the description.
- Time Range
- null
- Freshness
- Last update date is unknown; freshness unverified.
- Geography
- null