A medical dataset constructed for evaluating machine learning models in predicting diabetes occurrences. It contains clinical parameters for several patients, including pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, and age. The dataset is licensed under CC0-1.0.
Use Cases
- Train binary classification models to predict diabetes outcome based on diagnostic measurements.
- Analyze correlations between clinical variables like glucose levels, BMI, and age.
- Validate healthcare analytics models for diabetes risk assessment using patient data.
- Support educational projects in medical data analysis and machine learning applications.
Strengths
- Includes a specific outcome variable (0 or 1) for supervised learning tasks.
- Provides sample values for all 9 clinical attributes, clarifying data format and range.
- Released under a permissive CC0-1.0 license for broad reuse.
Limitations
- Row count is unknown, which may limit suitability assessment for large-scale modeling.
- Last update date is unknown; freshness unverified.
- Column-level documentation is absent beyond sample values; field semantics must be inferred after download.
Provenance
- Source
- openml
- Collection Method
- null
- Time Range
- null
- Freshness
- null
- Geography
- null