A 2023-2024 study by Sen Yang Xiao compares ChatGPT (GPT-4 API), DeepSeek (V3 API), and Google Search on answering cartilage repair questions. The dataset contains results from a matched three-way comparison across cartilage tissue engineering and cartilage repair surgery domains, including quality scoring and readability analysis. It was last updated on May 29, 2026.
Use Cases
- Benchmarking AI model accuracy and safety in medical domains based on the described Accuracy-Safety-Hallucination (ASH) framework scores
- Analyzing the readability of medical information from different sources based on Flesch-Kincaid Grade Level scores
- Comparing the functional roles of AI tools for stakeholder-specific matching based on the policy-oriented versus technical-depth framing described
- Studying question classification patterns in medical FAQs based on the modified Rothwell taxonomy mentioned
Strengths
- Includes a three-way comparison of major platforms (ChatGPT, DeepSeek, Google) on matched questions.
- Employs a dual-axis framework integrating classification, blinded quality scoring by three raters, and readability analysis.
- Results show specific median accuracy scores (e.g., DeepSeek median 5.00 in CTE) and statistically significant p-values (e.g., p = 0.036).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- The dataset is very small at 12.5 KB, indicating limited scope.
Provenance
- Source
- figshare
- Collection Method
- Top-10 Google-derived FAQs per domain were submitted to Google, ChatGPT (GPT-4 API), and DeepSeek (V3 API) for matched comparison.
- Time Range
- Cartilage tissue engineering questions from 2023, cartilage repair surgery questions from 2024.
- Freshness
- Last updated 2026-05-29 05:53:31; freshness should be verified.