LLM Responses on Biliary Tract Cancer Guidelines with Prompting Strategies
by figshare admin karger·Updated 1mo ago
82.9 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Supplementary Material for: Ask the Right Questions: Prompting Strategies Shape LLM Performance on Biliary Tract Cancer Guideline Queries contains the results of a cross-sectional analysis evaluating three advanced LLMs (GPT-4o, Claude 3.5 Sonnet, Llama 3 70b) on 40 clinical questions derived from ESMO guidelines. The dataset likely contains model responses evaluated for accuracy, conciseness, evidence quality, and hallucination rates under three prompting strategies. It was uploaded by figshare admin karger on 2026-04-29.
Use Cases
Benchmark LLM performance in medical contexts based on the evaluation of accuracy, conciseness, and evidence quality.
Analyze the effect of prompting strategies on LLM outputs based on the comparison of no prompt, short prompt, and long prompt.
Study hallucination rates in LLM-generated medical references based on the reported fabrication and misattribution metrics.
Compare the capabilities of different LLM models (GPT-4o, Claude 3.5 Sonnet, Llama 3 70b) on a specific clinical domain.
Strengths
Evaluation based on a reference standard (European Society for Medical Oncology guidelines).
Analysis includes three distinct prompting strategies and three advanced LLMs.
Responses were evaluated by two independent senior physicians for multiple criteria.
Limitations
The dataset is very small (82.9 KB), indicating limited scope.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
figshare
Collection Method
Cross-sectional analysis of LLM responses to 40 clinical questions.
Freshness
Last updated 2026-04-29 05:55:19; freshness should be verified.
License is CC-BY-4.0. Data is provided in PDF format.