A labeled dataset for detecting fraudulent internship opportunities using machine learning. The dataset is hosted on Kaggle. The author, organization, and specific data characteristics are unknown.
Use Cases
- Train binary classification models based on labeled fraudulent/non-fraudulent internship listings.
- Develop feature engineering pipelines based on textual and metadata attributes of job posts.
- Benchmark fraud detection algorithms based on labeled ground truth data.
Strengths
- Dataset is explicitly labeled for fraud detection, providing a supervised learning target.
- The description indicates a specific application domain (internship fraud).
Limitations
- Row count, column names, and sample data are unavailable, limiting suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Data may reflect temporal or source bias inherent to Kaggle submissions.