A matched three-way comparison of ChatGPT (GPT-4 API), DeepSeek (V3 API), and Google Search on 20 frequently asked questions in cartilage tissue engineering (2023) and cartilage repair surgery (2024). The dataset contains blinded quality scores using the Accuracy-Safety-Hallucination framework and readability analysis via the Flesch-Kincaid formula. It was authored by Sen Yang Xiao and uploaded to figshare in May 2026.
Use Cases
- Benchmarking AI chatbot accuracy and safety in medical contexts based on the Accuracy-Safety-Hallucination scoring framework.
- Comparing the readability of AI-generated answers versus search engine snippets based on Flesch-Kincaid grade level scores.
- Analyzing the functional roles of different AI tools for stakeholder-specific matching based on domain-specific performance results.
- Studying sourcing patterns and answer classification in medical FAQs based on the modified Rothwell taxonomy.
Strengths
- Provides a dual-axis framework integrating classification, blinded quality scoring, and readability analysis.
- Includes specific performance metrics, such as median accuracy scores (e.g., DeepSeek median 5.00 in CTE) and statistical significance results (e.g., Bonferroni-corrected p-values).
- Compares three distinct platforms (Google Search, ChatGPT, DeepSeek) on matched questions across two medical domains.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- The dataset is 11.0 KB, indicating a very limited scope and likely a small number of evaluated questions and scores.
Provenance
- Source
- figshare
- Collection Method
- Questions were derived from Google Search top FAQs, then submitted to all three platforms for comparison. Answers were classified and scored by three blinded raters.
- Time Range
- Questions sourced from cartilage tissue engineering FAQs in 2023 and cartilage repair surgery FAQs in 2024.
- Freshness
- Last updated 2026-05-29 05:53:30; freshness should be verified.