Logo image
Evaluating the Reliability and Readability of AI Chatbot Responses for Microtia Patient Education
Journal article   Peer reviewed

Evaluating the Reliability and Readability of AI Chatbot Responses for Microtia Patient Education

Supriya Dadi, Taylor Kring, Kyle Latz, David Cohen and Seth Thaller
The Journal of craniofacial surgery, Vol.37(3/4)
2025-10-02
PMID: 41037793

Abstract

microtia information quality readability health communication patient education chatbots Artificial intelligence
Ear microtia is a congenital deformity that can range from mild underdevelopment to complete absence of the external ear. Often unilateral, it causes visible facial asymmetry leading to psychosocial distress for patients and families. Caregivers report feeling guilty and anxious, while patients experience increased rates of depression and social challenges. This is often a difficult time for the patient and their families, who often turn to AI chatbots for guidance before and after receiving definitive surgical care. This study evaluates the quality and readability of leading AI-based chatbots when responding to patient-centered questions about the condition. Popular AI chatbots (ChatGPT 4o, Google Gemini, DeepSeek, and OpenEvidence) were asked 25 queries about microtia developed from the FAQ section on hospital websites. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability. ANOVA and post hoc analyses were performed to identify significant differences. Google Gemini achieved the highest DISCERN score (M=37.16, SD=2.58), followed by OpenEvidence (M=32.19, SD=3.54). DeepSeek (M=30.76, SD=4.29) and ChatGPT (M=30.32, SD=2.97) had the lowest DISCERN scores. OpenEvidence had the worst readability (M=18.06, SD=1.12), followed by ChatGPT (M=16.32, SD=1.41). DeepSeek was the most readable (M=14.63, SD=1.60), closely followed by Google Gemini (M=14.73, SD=1.27). Overall, the average DISCERN and SMOG scores across all platforms were 32.19 (SD=4.43) and 15.93 (SD=1.94), respectively, indicating a good quality and an undergraduate reading level. None of the platforms consistently met both quality and readability standards, though Google Gemini performed relatively well. As reliance on AI for early health information grows, ensuring the accessibility of chatbot responses will be crucial for supporting informed decision-making and enhancing the patient experience.

Metrics

1 Record Views

Details

Logo image