CatBoost Model 2E: 1-Year Mortality Predictions for Ontario Adults
by Steven Habbous·Updated 2mo ago
1.1 MB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
12,080,801 adult residents of Ontario, Canada, were used to train a CatBoost model predicting 1-year mortality. The model, created by Steven Habbous and released in April 2026, achieved an AUROC of 0.933 and was trained on administrative health data including age, sex, comorbidities, and healthcare utilization measures.
Use Cases
Benchmarking machine learning models for mortality prediction based on the comparison of logistic regression, random forests, and boosting methods described.
Improving confounder control in observational health studies based on the risk adjustment application mentioned.
Analyzing feature importance for health outcomes based on the reported use of permutation importance and marginal effects.
Calibrating predictive models for high-risk populations based on the sensitivity analysis results for individuals in the 70–100% risk bracket.
Strengths
Model performance is quantitatively reported with multiple metrics, including an AUROC of 0.933 and an ICI of 0.0003.
Training data covers a large population of over 12 million individuals from a defined geographic region (Ontario, Canada).
The study compares eight distinct modeling approaches, providing a basis for methodological comparison.
Feature importance is assessed using multiple techniques, including CatBoost's internal structure and permutation methods.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the underlying dataset is unknown, which may limit suitability assessment.
The dataset's 1.1 MB size suggests it likely contains model parameters or a subset of data, not the full training data.
Provenance
Source
Administrative healthcare databases for Ontario, Canada.
Collection Method
Observational study using a 3-year window of health and healthcare utilization data for adults alive as of January 1, 2022.
Time Range
Data covers up to 3 years prior to the index date of January 1, 2022, with a 1-year mortality outcome window.
Freshness
Last updated 2026-04-23 17:25:39; freshness should be verified.
Geography
General adult population of Ontario, Canada.
File format is CBM (CatBoost Model), requiring compatible software (e.g., CatBoost library) to load and use.