Synthetic data derived from Spanish public health electronic health records. The dataset is published on Kaggle, but its size, specific source, and creation date are unknown. Columns likely contain patient-level health information, though their exact nature requires verification.
Use Cases
- Training anonymized patient outcome prediction models (inferred from domain, verify after download)
- Benchmarking synthetic data generation techniques for EHRs (inferred from domain, verify after download)
- Developing privacy-preserving clinical decision support systems (inferred from domain, verify after download)
Strengths
- Published on Kaggle.
- Focuses on synthetic data, which can mitigate privacy concerns for model development.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Collection Method
- Synthetic generation from original EHRs.
- Geography
- Spain