Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Marc Leon's dataset contains results from a blinded two-phase evaluation of five large language models on 15 high-fidelity cardiac surgery scenarios. The data includes normalized performance scores across 10 weighted evaluation dimensions and records of rating revisions by senior surgeons. The dataset was last updated on 2026-05-29 and is licensed under CC-BY-4.0.
License is CC-BY-4.0, requiring attribution.