Table 1_Identification of BMI-related high-risk feature combinations for diabetes among yo
by Zhen Xu·Updated 22d ago
27.0 MB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
103,693 young adults with normal baseline fasting plasma glucose were tracked for incident diabetes over a median follow-up of 2.99 years. The data, sourced from the Rich Healthcare Group health check-up database in China, was analyzed using Cox regression and machine learning models like XGBoost. SHAP analysis identified BMI as the most influential predictor, revealing a nonlinear risk increase beyond approximately 28 kg/m².
Use Cases
Training diabetes risk prediction models based on BMI and metabolic features.
Analyzing nonlinear risk patterns for BMI using restricted cubic spline analysis.
Investigating joint contribution patterns of risk factors like triglycerides and systolic blood pressure with SHAP.
Comparing performance of logistic regression and XGBoost models on low-event-rate clinical data.
Strengths
Large cohort of 103,693 participants with complete BMI data.
Longitudinal design with a median follow-up of 2.99 years and 266 incident diabetes events.
Analysis includes both conventional Cox regression and multiple machine learning models (logistic regression, XGBoost).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The low PR-AUC values reported indicate limited ability to identify true positive cases under the very low event rate.
Provenance
Source
Rich Healthcare Group health check-up database in China.
Collection Method
Cohort study from health check-up data.
Freshness
Last updated 2026-05-15 04:21:40; freshness should be verified.
Geography
China.
License is CC-BY-4.0. Data is in XLSX format (27.0 MB).