TabArena curated this dataset for evaluating predictive models on independent and identically distributed tabular data. The intended task is binary classification, likely to predict customer quality. The original data was sourced from a 2020 Kaggle dataset by user Podsyp.
Use Cases
- Benchmark binary classification models based on the described independent and identically distributed data structure.
- Evaluate customer quality prediction algorithms based on the dataset's focus.
- Compare feature engineering and model selection techniques for tabular classification tasks.
Strengths
- Dataset is explicitly curated for evaluating ML models on independent and identically distributed (IID) data.
- License is CC0: Public Domain, allowing unrestricted use.
- Target variable values were renamed by curators to be more descriptive.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats, and sample data are unknown, which may limit suitability assessment.
- Last update date is unknown; freshness unverified.
Provenance
- Source
- Kaggle User Podsyp. 'Is This a Good Customer?' Kaggle, 2020.
- Collection Method
- Curated by the TabArena team from the original Kaggle source.
- Time Range
- Dataset Year: 2020.
- Freshness
- Dataset Year is 2020; last updated date is unknown.
- Geography
- null