Description

MedDialBench is a controlled factorial benchmark for evaluating large language model diagnostic robustness under parametric adversarial patient behaviors. It is an anonymous submission to the NeurIPS 2026 Datasets and Benchmarks Track, with a companion paper under double-blind review. The dataset was last updated on May 6, 2026.

Use Cases

Benchmarking LLM diagnostic accuracy based on adversarial patient conversation scenarios.
Evaluating model robustness to parametric variations in patient behavior described in the benchmark.
Training and testing clinical dialogue agents on controlled, factorial test conditions.

Strengths

Designed as a controlled factorial benchmark for systematic evaluation.
Focuses on a specific, high-stakes application area: diagnostic robustness in medical dialogue.
Associated with a peer-reviewed submission to a major conference (NeurIPS 2026).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset size are unknown, which may limit suitability assessment.
The dataset is currently anonymized, and full provenance details are withheld pending publication.

Provenance

Source: Anonymous submission by anon-meddial-2026.
Collection Method: Likely contains synthetically generated or curated medical dialogue scenarios designed for benchmark evaluation.
Freshness: Last updated 2026-05-06 09:59:44; freshness should be verified.

The dataset is currently in an anonymized state on Hugging Face; a permanent, de-anonymized repository is planned after publication.

Text Medical Dialogue Benchmark Llm Evaluation Healthcare Clinical Conversation Diagnostic Robustness

MedDialBench: Evaluating LLM Diagnostic Robustness Against Adversarial Patient Behaviors

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info