Name: Evaluation of a RAG-Based LLM for Prehospital Care Recommendations
Creator: Colin G Wang
Published: 2026-05-07T19:00:03
License: CC-BY-4.0
Keywords: Retrieval Augmented Generation, Benchmark, Healthcare, Text, Emergency Medical Services, Clinical Scenarios, Large Language Models, Protocol Evaluation

Description

75% of 169 expected patient care actions were correctly recommended by a retrieval-augmented generation (RAG) large language model (LLM) grounded in a single EMS agency's protocols. The exploratory evaluation, authored by Colin G Wang and uploaded on 2026-05 07, tested the model's accuracy across six adult and pediatric prehospital scenarios. The study identified 42 missed actions, including 9 categorized as 'major misses'.

Use Cases

Benchmarking LLM accuracy for clinical recommendations based on structured text protocols.
Studying model 'hallucinations' in a high-stakes medical context as described in the evaluation.
Analyzing failure modes in LLM-driven care guidance, such as missed secondary cause evaluations in pediatric cases.
Developing simulation frameworks for testing AI-assisted decision-making in prehospital emergency scenarios.

Strengths

The evaluation is grounded in real EMS policies and treatment protocols from a single large agency.
Provides specific performance metrics: 75% accuracy, 42 missed actions categorized by safety risk.
Includes analysis of 12 hallucinations and their assessed impact on patient safety.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small (266.1 KB), indicating a limited scope focused on the evaluation results.

Provenance

Source: figshare
Collection Method: Non-human, simulation-based experimental study using a RAG-based LLM (Gemini 2.5 Flash) on uploaded EMS protocols.
Time Range: The study was uploaded in 2026; the temporal coverage of the underlying protocols is not specified.
Freshness: Last updated 2026-05-07 19:00:03; freshness should be verified.
Geography: Based on protocols from a single large EMS system; geographic coverage is not specified.

Files are in PDF and DOCX formats, not a structured data file. The dataset contains the study's documentation and results, not the raw clinical data or model outputs.

Text Retrieval Augmented Generation Benchmark Healthcare Emergency Medical Services Clinical Scenarios Large Language Models Protocol Evaluation

Evaluation of a RAG-Based LLM for Prehospital Care Recommendations

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info