Sign in to view source links and access this dataset
Description
MAX-EVAL-11 is a large-scale benchmark for evaluating large language models on full-spectrum ICD-11 medical coding. It comprises 10,000 MIMIC-III discharge summaries with expert-validated ICD-11 annotations covering 99.87% of the ICD-11 diagnostic codes. The dataset was created by mas-namtla and was last updated on HuggingFace in May 2026.
Use Cases
Benchmarking LLM performance on ICD-11 coding tasks based on the expert-validated annotations
Training models for automated medical code assignment based on clinical discharge summaries
Analyzing the coverage and distribution of ICD-11 diagnostic codes within a large clinical corpus
Strengths
10,000 discharge summaries provide a substantial corpus for model evaluation