Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MedMisBench is a benchmark for evaluating large language models on medical question-answering tasks when misleading context is introduced. It is built from five medical QA sources covering standard reasoning, expert reasoning, patient-journey scenarios, and agentic biomedical capability. The dataset was created by AI4HealthResearch and was last updated on Hugging Face in May 2026.
The full description is hosted externally on the Hugging Face dataset page.