Name: MedHall-Bench: A Field-Grounded Hallucination Detection Benchmark for Medical AI
Creator: healthmemoryarena
Published: 2026-04-20T10:09:49
Keywords: Hallucination Detection, Benchmark, Healthcare, Clinical Evaluation, Text, Medical Ai

Description

MedHall-Bench is a field-grounded hallucination detection benchmark for medical AI assistants. It decomposes clinical responses into verifiable structured fields and evaluates AI outputs via per-field programmatic matching and sentence-level LLM-as-Judge. The dataset is designed for use with the HolyEval framework and was created by healthmemoryarena, with a last recorded update in April 2026.

Use Cases

Benchmarking medical AI assistant accuracy based on verifiable structured fields like dose and unit.
Evaluating hallucination detection methods using programmatic field matching described in the benchmark.
Training or fine-tuning models for clinical information extraction based on the structured field decomposition approach.

Strengths

Designed for programmatic evaluation via per-field matching, which suggests a structured and repeatable assessment method.
Benchmark is field-grounded, implying a connection to real-world clinical scenarios.
Integrates both programmatic matching and LLM-as-Judge evaluation, offering a multi-faceted assessment approach.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: huggingface
Collection Method: Likely constructed for benchmarking purposes, as described.
Freshness: Last updated 2026-04-21 02:45:43; freshness should be verified.

License is unknown; intended for research use only and not for clinical application.

Text Hallucination Detection Benchmark Healthcare Clinical Evaluation Medical Ai

MedHall-Bench: A Field-Grounded Hallucination Detection Benchmark for Medical AI

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info