Name: Comparative Performance of Four LLMs in Generating Exercise Prescriptions
Creator: Huan Feng
Published: 2026-05-25T04:34:41
License: CC-BY-4.0
Keywords: Exercise Prescription, Benchmark, Healthcare, Clinical Evaluation, Text, Large Language Models, Healthcare Ai, Synthetic

Description

Four large language models were evaluated on their ability to generate personalized exercise prescriptions using the FITT-VP framework. Claude 3.7 achieved the highest total score of 50.23 out of 60, while DeepSeek R1 scored the lowest at 40.30. The dataset, authored by Huan Feng and last updated in May 2026, contains the study results and analysis in a 374.6 KB document.

Use Cases

Benchmarking LLM performance in clinical text generation based on the FITT-VP framework
Analyzing model strengths and weaknesses in handling specific prescription dimensions like Progression and Intensity
Supporting research into human-AI collaborative frameworks for personalized exercise medicine

Strengths

Scores for four specific LLMs (GPT-4o, Claude 3.7, DeepSeek R1, Grok-3) are provided with statistical significance (p < 0.001, η² = 0.896)
Prescriptions were evaluated by three certified exercise specialists using a validated 0–10 scale across six dimensions
The study is based on 30 synthetic patient profiles designed from epidemiological data and clinical guidelines

Limitations

The data is contained in a single 374.6 KB DOCX file, which is a tiny dataset with limited scope
Column-level documentation is absent; field semantics must be inferred after download
Results are based on a single-run evaluation of static synthetic profiles rather than real-world clinical outcomes

Provenance

Source: figshare
Collection Method: Generated by evaluating four LLMs on 30 synthetic patient profiles using the FITT-VP framework.
Freshness: Last updated 2026-05-25 04:34:41

License is CC-BY-4.0. Data is presented as a study document (DOCX), not a structured data table.

Text Exercise Prescription Benchmark Healthcare Clinical Evaluation Large Language Models Healthcare Ai Synthetic

Comparative Performance of Four LLMs in Generating Exercise Prescriptions

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info