A dataset for developing fraud detection models in health insurance. The data is synthetic, likely generated to simulate patterns of fraudulent and legitimate claims. It is hosted on Kaggle, but the specific author, size, and creation date are unknown.
Use Cases
- Training a binary classifier to flag suspicious insurance claims (inferred from domain, verify after download)
- Benchmarking unsupervised anomaly detection algorithms on transactional data (inferred from domain, verify after download)
- Feature engineering for model interpretability in fraud analytics (inferred from domain, verify after download)
Strengths
- Published on Kaggle, a platform with established data sharing and community features.
- Focuses on a high-impact application area in healthcare finance.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- Kaggle
- Collection Method
- Synthetically generated, but the specific generation method is unknown.
- Time Range
- null
- Freshness
- Last update date is unknown; freshness unverified.
- Geography
- null