MedMisBench: Medical QA Benchmark with Misleading Context

Name: MedMisBench: Medical QA Benchmark with Misleading Context
Creator: AI4HealthResearch
Published: 2026-04-21T01:14:20
Keywords: Llm Benchmark, Medical Qa, Medical Reasoning, Benchmark, Healthcare, Text, Misleading Context

by AI4HealthResearchUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

MedMisBench is a benchmark for evaluating large language models on medical question-answering tasks when misleading context is introduced. It is built from five medical QA sources covering standard reasoning, expert reasoning, patient-journey scenarios, and agentic biomedical capability. The dataset was created by AI4HealthResearch and was last updated on Hugging Face in May 2026.

Use Cases

Benchmarking LLM robustness against misleading medical information based on the described benchmark items.
Evaluating medical judgment preservation in AI agents based on the patient-journey and agentic capability sources.
Studying failure modes in medical QA systems based on the introduced misleading context.

Strengths

Benchmark is constructed from five distinct medical question-answering sources, suggesting breadth in reasoning types.
Each benchmark item contains a source question, correct answer, and misleading context, providing a structured evaluation framework.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: AI4HealthResearch on Hugging Face.
Collection Method: Built from five medical question-answering sources.
Freshness: Last updated 2026-05-06 22:45:26; freshness should be verified.

The full description is hosted externally on the Hugging Face dataset page.

Text Llm Benchmark Medical Qa Medical Reasoning Benchmark Healthcare Misleading Context

Related Datasets

Quality Score

D37

Description

39

Source

36

Reputation

41

Access

26

Community

55 downloads

1 likes

0 views

Dataset Info

Author: AI4HealthResearch
Created: Apr 21, 2026
Updated: May 6, 2026
Last synced: Jun 22, 2026

Access

26

Community

55 downloads

1 likes

0 views

Dataset Info

Author: AI4HealthResearch
Created: Apr 21, 2026
Updated: May 6, 2026
Last synced: Jun 22, 2026

MedMisBench: Medical QA Benchmark with Misleading Context

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info