A matched three-way comparison of ChatGPT, DeepSeek, and Google Search on cartilage repair questions. The dataset includes blinded quality scores and readability analysis for 20 questions across cartilage tissue engineering and surgery domains. Sen Yang Xiao published the results on figshare in 2026.
Use Cases
- Benchmarking AI answer accuracy in medical domains based on the Accuracy-Safety-Hallucination (ASH) framework scores.
- Comparing the readability of AI-generated versus search engine answers based on Flesch-Kincaid Grade Level scores.
- Analyzing the functional roles of different AI tools for stakeholder-specific matching based on domain performance differences.
- Studying question classification patterns in medical FAQs based on the modified Rothwell taxonomy.
Strengths
- Includes direct performance comparisons with statistical significance testing (e.g., Bonferroni-corrected p-values).
- Employs a multi-rater blinded evaluation framework with three independent raters.
- Covers two distinct medical domains: cartilage tissue engineering (2023) and cartilage repair surgery (2024).
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- The dataset is very small at 10.3 KB, indicating limited scope.
Provenance
- Source
- figshare
- Collection Method
- Questions were sourced from Google Search FAQs, with answers generated by ChatGPT (GPT-4 API), DeepSeek (V3 API), and Google Search.
- Time Range
- Questions sourced from 2023 (cartilage tissue engineering) and 2024 (cartilage repair surgery).
- Freshness
- Last updated 2026-05-29 05:53:31; freshness should be verified.