In today's dental clinical practice, although the dissemination and development of EBD continues, successful implementation becomes difficult due to difficulties such as rapid scientific and technological developments, outdated guidelines, lack of evidence and implementation
14. AI chatbots, which can theoretically generate immediate evidence-based answers to scientific questions and thus act as the dentist's personal scientific advisor at the chairside, seem to have the potential to be an ideal tool for the successful implementation and development of EBD
15. At the same time, it is an important part of health services that patients can always access their health information accurately and easily
16. With technological advances, patients can access a many of resources to obtain information about healthcare services (such as diagnosis, health promotion, consultancy services) using artificial intelligence systems. Additionally, this easy access to reliable and accurate health information can also help patients manage their health more effectively
17.
Although LLMs have the ability to identify patterns and organize data, they are known to have limitations in being able to fully understand and grasp the underlying meaning and context of information 18. Therefore, in our study, multiple-choice questions were used instead of open-ended questions to ensure that the LLMs clearly indicate the type of answer requested, thereby preventing the generation of additional or fictitious information.
When the findings of our study were evaluated, although there was no statistical difference in terms of score for all 3 LLMs, ChatGPT and Gemini had higher accuracy values in all branches of dentistry. Taşkın et al. 19 evaluated the success of LLMs in answering common orthodontic questions and reported that ChatGpt gave the most desirable answers. Giannakopoulos et al. 15 evaluated the performance of LLMs (ChatGPT, Gemini and Bing) in supporting evidence-based dentistry and found that although there was no statistical difference between the 3 LLMs, Bing had the lowest score. Suarez et al. 20 evaluated the consistency and accuracy of ChatGPT's endodontic question answers and reported that overall consistency of answers produced by ChatGPT was high, but that it underperformed in correctly answering questions of lower difficulty. In Acar 21’s study evaluating how LLMs answered to questions about oral surgery complications, ChatGPT showed a higher accuracy score than Bing and Gemini. Differences in the data used to train and teach AI models may be why chatbots vary in their human-like responses 22,23. In Balel 24's study, ChatGPT was asked questions frequently asked by patients about oral and maxillofacial surgery procedures. As a result of this study, he reported that ChatGPT has significant potential as a patient information tool, but may not be completely safe for the time. Because ChatGPT is designed to produce human-like text by predicting the likelihood of a word based on previous words in a sentence, it may have responded more accurately and quickly based on context and patterns learned from previously extensive datasets 25. Howard et al. 26 asked ChatGPT about the statements in the 2020 clinical consensus statement on ankyloglossia and reported that although ChatGPT reflects medical perspectives on ankyloglossia, caution should be exercised in aligning with non-consensus statements and relying on it for medical advice. While AI chatbots have access to information due to their nature, they produce several possible responses and may be selecting fewer possible responses with repeated input, given the randomness in the code. This can be especially dangerous in the medical setting, and patients and families should be warned about this potential limitation of AI chatbots as a resource.
When the findings of our study were evaluated in terms of seconds, statistically significant differences were found in all branches of dentistry for all 3 LLMs. In our study, ChatGPT, Gemini and Bing gave the fastest answers to questions in all branches of dentistry, respectively. ChatGPT's speed and greater accuracy can be attributed to its large database, more reliable availability, and extensive training 27. It is also known that Gemini 28 and Bing 29 are based on ChatGPT technologies, but since the exact architectures and technical details of the models are not known, they may perform differently from each other. These differences can also be attributed to the varying design philosophies of AI companies, different algorithms used, the datasets used for training, and the goals that AI is designed to achieve 22,30.
Despite the generally high validity scores of LLMs, these chatbots are known to make critical errors in some answers. In particular, the length, complexity and difficulty of the questions and the multifaceted nature of dental knowledge may be the cause of these critical errors. LLMs' answers were often more superficial and lacked deep thinking, which revealed their limitations in handling complex queries 31.
Additionally, as with any new technology, there are limitations that must be addressed to ensure the benefits of using this innovation outweigh its risks. These limitations may have the potential to produce biased, outdated or inaccurate content 32. AI-based training tools can compromise the ability of healthcare professionals to develop the skills necessary for human interaction and communication, critical skills, and can also mislead patients into misinformation 33.
In conclusion, LLMs are poised to make a substantial impact on numerous facets of dentistry in the foreseeable future. Nevertheless, they currently lack the capability to replacing dentists in clinical decision-making. Given that AI technologies are still evolving, additional research and development are necessary to unlock their full potential benefits for dental healthcare. With advancements in deep learning, the performance of LLMs is anticipated to enhance, rendering them increasingly valuable and efficient in dentistry. Continuous studies over time are essential to evaluate the learning curve and the capacity of AI to evolve and improve.