Name: LLM Performance Data for Systematic Review Extraction
Creator: Takehiko Oami
Published: 2026-04-29T05:52:37
License: CC-BY-4.0
Keywords: Systematic Review, Data Extraction, Llm Evaluation, Healthcare, Text, Medical Research, Clinical Trials, Synthetic

Description

A 1.1 MB document compares ChatGPT-4o, Claude 3 Sonnet, and Gemini 1.5 Pro for extracting data from sepsis trial PDFs. Takehiko Oami authored this study, which was uploaded to figshare on April 29, 2026. Mean no-error proportions for background data extraction ranged from 81.6% to 92.4%, while outcome extraction accuracy was lower, ranging from 27.8% to 80.7%.

Use Cases

Benchmarking LLM accuracy for clinical data extraction based on reported no-error proportions
Evaluating prompt engineering strategies like chain-of-thought and self-reflection
Analyzing inter-session consistency of LLM outputs across three sessions
Comparing processing times between standard and self-reflection prompts

Strengths

Performance metrics are provided for three specific LLMs (ChatGPT-4o, Claude 3 Sonnet, Gemini 1.5 Pro)
Results include processing times per article, ranging from 19.3 to 107.1 seconds
The study evaluates five specific clinical questions from the J-SSCG 2024 guidelines

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
The dataset is a 1.1 MB DOCX file; the underlying data format and structure are not described

Provenance

Source: figshare
Collection Method: LLMs extracted predefined characteristics and outcomes from PDFs of eligible studies, with outputs assessed against a manual extraction reference standard.
Freshness: Last updated 2026-04-29 05:52:37

License is CC-BY-4.0. The primary data is contained within a DOCX document.

Text Systematic Review Data Extraction Llm Evaluation Healthcare Medical Research Clinical Trials Synthetic

LLM Performance Data for Systematic Review Extraction

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info