Description

A dataset derived from paper-based internal transfer forms and neonatal admission records used in Kenyan public hospitals providing newborn inpatient care. It contains images of clinical documents paired with JSON files providing gold-standard extracted data, intended for benchmarking AI and LLM performance in clinical data extraction. The dataset is a product of a hybrid paper-digital pipeline designed for rapid deployment in resource-limited clinical settings.

Use Cases

Benchmarking LLM performance for extracting structured clinical data from document images.
Training AI models to automate data entry from paper-based neonatal admission records.
Evaluating hybrid paper-digital pipeline efficacy for creating complex clinical registries.
Researching data extraction methods applicable in resource-limited hospital settings.

Strengths

Includes a gold-standard JSON dataset for benchmarking, enabling direct performance evaluation.
Designed for practical deployment with minimal training, suggesting real-world applicability.
Cross-platform presence on Dataverse indicates curated importance and accessibility.

Limitations

Critical metadata is missing: no column names, row counts, file sizes, or license information are provided.
The dataset's temporal coverage and specific hospital sources within Kenya are not stated.
Relies on inference for data types; the exact structure of the JSON and image files is unspecified.

Provenance

Source: Clinical Information Network (CIN) Group Dataverse, authored by Timothy Tuti.
Collection Method: Created via a novel hybrid paper-digital clinical data pipeline using AI to extract data from paper records.
Time Range: null
Freshness: Last updated 2026-03 24 13:01:32
Geography: Kenyan public hospitals providing newborn inpatient care.

null

Image Text KENYA Benchmark Healthcare Computer Vision Artificial Intelligence Neonatal Health Paper To Digital Medicine Health And Life Sciences Clinical Data Pipelines

Data BRIDGE: Neonatal Inpatient Paper Records and AI Extractions

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info