DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

MedMisBench: A Medical QA Benchmark for LLMs with Misleading Context | DataSalon

Home Medical & ClinicalMedMisBench: A Medical QA Benchmark for LLMs with Misleading Context

Medical & Clinical

MedMisBench: A Medical QA Benchmark for LLMs with Misleading Context

Name: MedMisBench: A Medical QA Benchmark for LLMs with Misleading Context
Creator: HongjianZhou
Published: 2026-06-14T23:52:35
Keywords: Llm Benchmark, Medical Qa, Medical Reasoning, Benchmark, Healthcare, Text, Misleading Context

by HongjianZhou·Updated 9d ago

Available on 1 platform

Description

MedMisBench evaluates whether large language models preserve correct medical judgment when misleading context is introduced. The benchmark is built from five medical question-answering sources spanning standard medical reasoning, expert reasoning, patient-journey scenarios, and agentic biomedical capability. It was created by HongjianZhou and last updated on June 15, 2026.

Use Cases

Benchmarking LLM robustness based on multiple-choice medical questions with misleading context.
Studying failure modes in medical reasoning based on the five source types mentioned in the description.
Developing mitigation strategies for AI in healthcare based on adversarial medical QA scenarios.

Strengths

Benchmark is built from five distinct medical question-answering sources.
Focuses on a specific evaluation task: preserving judgment against misleading medical context.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-06-15 04:40:37; freshness should be verified.

Provenance

Source: HongjianZhou on Hugging Face.
Collection Method: Built from five medical question-answering sources.
Freshness: Last updated 2026-06-15 04:40:37.

License is unknown; terms of use must be verified before application.

Text Llm Benchmark Medical Qa Medical Reasoning Benchmark Healthcare Misleading Context

Related Datasets

Quality Score

D36

Description

Source

Reputation

Quality Score

D36

Description

Source

Reputation

Access

Community

1 likes

0 views

Dataset Info

Author: HongjianZhou
Created: Jun 14, 2026
Updated: Jun 15, 2026
Last synced: Jun 24, 2026

Access

Community

1 likes

0 views

Dataset Info

Author: HongjianZhou
Created: Jun 14, 2026
Updated: Jun 15, 2026
Last synced: Jun 24, 2026

MedMisBench: A Medical QA Benchmark for LLMs with Misleading Context

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info