A dataset derived from paper-based internal transfer forms and neonatal admission records used in Kenyan public hospitals providing newborn inpatient care. It contains images of clinical documents paired with JSON files providing gold-standard extracted data, intended for benchmarking AI and LLM performance in clinical data extraction. The dataset is a product of a hybrid paper-digital pipeline designed for rapid deployment in resource-limited clinical settings.
Use Cases
- Benchmarking LLM performance for extracting structured clinical data from document images.
- Training AI models to automate data entry from paper-based neonatal admission records.
- Evaluating hybrid paper-digital pipeline efficacy for creating complex clinical registries.
- Researching data extraction methods applicable in resource-limited hospital settings.
Strengths
- Includes a gold-standard JSON dataset for benchmarking, enabling direct performance evaluation.
- Designed for practical deployment with minimal training, suggesting real-world applicability.
- Cross-platform presence on Dataverse indicates curated importance and accessibility.
Limitations
- Critical metadata is missing: no column names, row counts, file sizes, or license information are provided.
- The dataset's temporal coverage and specific hospital sources within Kenya are not stated.
- Relies on inference for data types; the exact structure of the JSON and image files is unspecified.
Provenance
- Source
- Clinical Information Network (CIN) Group Dataverse, authored by Timothy Tuti.
- Collection Method
- Created via a novel hybrid paper-digital clinical data pipeline using AI to extract data from paper records.
- Time Range
- null
- Freshness
- Last updated 2026-03 24 13:01:32
- Geography
- Kenyan public hospitals providing newborn inpatient care.