Name: TCMI Foundational Competency Benchmark for Large Language Models
Creator: Zhaohang Teng
Published: 2026-04-07T05:30:33
License: CC-BY-4.0
Keywords: Traditional Chinese Medicine, Benchmark, Education, Tabular, Informatics, Large Scale, Large Language Models

Description

A 2026 benchmark evaluates foundational interdisciplinary competencies in Traditional Chinese Medicine Informatics (TCMI). Zhaohang Teng constructed the TCMI-F-6D benchmark from six core disciplines in the MMLU dataset to assess 20 large language models. The study provides a quantifiable framework for model evaluation in TCMI-related scenarios.

Use Cases

Benchmarking model performance on TCMI foundational knowledge using the TCMI-F-6D composite metric system.
Analyzing interdisciplinary knowledge integration scores, such as the 43.97% achieved by ChatGLM3-6B.
Evaluating learning gains and performance stability of chat models, like Qwen-14B-Chat's average gain of 5.60%.
Comparing overall application performance across models, exemplified by DeepSeek-V3.1's 80.87% score.
Identifying models with weaker interdisciplinary competency for focused characteristic analysis across 8 categories.

Strengths

Evaluates 20 distinct large language models.
Provides a composite metric system built from six core disciplines.
Reports specific performance metrics, such as a 95% confidence interval of [5.50%, 5.70%] for learning gains.

Limitations

The dataset is a single 679.1 KB document, indicating limited raw data scope.
Unknown row count and lack of accessible sample data or column definitions.
Focus is restricted to a niche interdisciplinary field, limiting generalizability.

Provenance

Source: figshare, authored by Zhaohang Teng.
Collection Method: Constructed from selected disciplines in the Massive Multitask Language Understanding (MMLU) dataset, grounded in Cognitive Hierarchy and Disciplinary Knowledge Structure theories.
Time Range: Benchmark construction and evaluation completed circa 2026.
Freshness: Last updated April 7, 2026.

Primary data is embedded within a DOCX file; users must extract tables or metrics from the document. License is CC-BY-4.0.

Tabular Traditional Chinese Medicine Benchmark Education Informatics Large Scale Large Language Models

TCMI Foundational Competency Benchmark for Large Language Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info