Synthetic patient records for 575,415 individuals with complete medical histories, generated using the Synthea framework. The dataset was created by author 'richardyoung' and last updated on March 23, 2026. It contains no real patient data, making it HIPAA-safe for machine learning research and education.
Use Cases
- Train models for patient outcome prediction based on synthetic medical histories.
- Benchmark algorithms for clinical concept extraction based on synthetic conditions and medications.
- Simulate healthcare system interactions for policy research based on synthetic patient encounters.
- Develop and test fairness audits in healthcare AI based on realistic synthetic demographics.
Strengths
- Contains 575,415 synthetic patient records, providing substantial scale.
- Uses Synthea, described as the gold standard for generating synthetic EHR data.
- Explicitly contains no real patient data, ensuring HIPAA compliance and privacy safety.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Collection Method
- Generated synthetically using the Synthea framework.
- Time Range
- null
- Freshness
- Last updated 2026-03-23 02:07:03; freshness should be verified.
- Geography
- null