Name: LLM Performance and Human-AI Collaboration in 15 Cardiac Surgery Scenarios
Creator: Marc Leon
Published: 2026-05-29T06:11:59
License: CC-BY-4.0
Keywords: Human Ai Collaboration, Benchmark, Llm Evaluation, Healthcare, Tabular, Audio, Medical Ai, Cardiac Surgery, Excel, Synthetic

Description

Marc Leon's dataset contains results from a blinded two-phase evaluation of five large language models on 15 high-fidelity cardiac surgery scenarios. The data includes normalized performance scores across 10 weighted evaluation dimensions and records of rating revisions by senior surgeons. The dataset was last updated on 2026-05-29 and is licensed under CC-BY-4.0.

Use Cases

Benchmarking LLM performance on complex clinical reasoning tasks based on the 15 cardiac surgery scenarios.
Analyzing human-AI collaboration patterns based on the two-phase evaluation and rating revision data.
Comparing model strengths and weaknesses across evaluation dimensions like patient safety and hallucination avoidance.
Studying the phenomenon of 'overacceptance' where clinicians may incorrectly accept flawed AI reasoning.

Strengths

Includes performance scores for five specific LLMs (O1, O3-mini-high, DeepSeek-R1, GPT-4, Llama3-OpenBioLLM-70B).
Evaluation is based on 15 expert-developed cardiac surgery scenarios and a 10-dimensional weighted framework.
Captures human evaluator judgment shifts through a two-phase blinded review process.

Limitations

Dataset is very small at 11.4 KB, indicating limited scope.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: figshare, author Marc Leon.
Collection Method: A panel of senior cardiac surgeons developed scenarios; a separate group conducted blinded evaluations.
Freshness: Last updated 2026-05-29 06:11:59; freshness should be verified.

License is CC-BY-4.0, requiring attribution.

Tabular Audio Excel Human Ai Collaboration Benchmark Llm Evaluation Healthcare Medical Ai Cardiac Surgery Synthetic

LLM Performance and Human-AI Collaboration in 15 Cardiac Surgery Scenarios

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info