Artigo Acesso aberto Revisado por pares

How ChatGPT performs in Oral Medicine: The case of oral potentially malignant disorders

2023; Wiley; Volume: 30; Issue: 4 Linguagem: Inglês

10.1111/odi.14750

ISSN

1601-0825

Autores

Márcio Diniz Freitas, Berta Rivas Mundiña, José Ramón García‐Iglesias, E. Vázquez García, Pedro Diz Dios,

Tópico(s)

Radiomics and Machine Learning in Medical Imaging

Resumo

The potential of artificial intelligence (AI) in the setting of Dentistry and, in particular, Oral Medicine is gaining importance (Patil et al., 2022). The recently launched ChatGPT, a free-access AI tool developed by the OpenAI company, consists of a model trained with large quantities of data, capable of understanding and generating human language with high precision and consistency (OpenAI, https://openai.com/gpt/). The advantage of ChatGPT in the medical setting in terms of navigating web browsers is that it quickly provides appropriate responses to the proposed questions and facilitates decision making based on recent research and guidelines, summarizing the available information and obviating the need for the interviewer to review the original sources (Ahn, 2023). Since its release, ChatGPT has generated considerable expectations regarding its potential utility in the health sciences setting, particularly in terms of education, research and practice for the various medical disciplines (Sallam, 2023). To our knowledge, however, no studies have been published to date on the use of ChatGPT in the setting of Oral Medicine. A particularly concerning topic is the identification of oral potentially malignant disorders (OPMDs). The importance of developing AI tools that facilitate the clinical detection of OPMDs has therefore been indicated (de Souza et al., 2023). The aim of this study is to assess the performance of ChatGPT in responding to questions on the diagnosis and management of OPMDs. We started by selecting guidelines and consensus documents disseminated by reputable scientific societies on the definition, classification, diagnosis, evaluation, malignant transformation and management of OPMDs (Birur et al., 2022; Chen et al., 2021; Lingen et al., 2017; Stojanov & Woo, 2018; Warnakulasuriya et al., 2021). Based on these statements, we formulated a series of "primary questions" that were entered into ChatGPT (version 23 of March 2023) using the function "New chat". We started by selecting guidelines and consensus documents disseminated by reputable scientific societies on the definition, classification, diagnosis, evaluation, malignant transformation and management of OPMDs. From each guideline, we chose those statements that had an overwhelming positioning (the panel does/does not recommend). Additionally, the GRADE grid (Jaeschke et al, 2008) was specified when this was available. The review and qualification of the responses were performed independently by two reviewers, and the discrepancies were resolved by a third reviewer. The accuracy of each response was qualified with the following score: 1. Complete; 2. Correct but insufficient; 3. Includes correct and incorrect/outdated data; and 4. Completely incorrect (Yeo et al., 2023). When the responses to the primary questions were qualified as correct but insufficient, "secondary questions" were created to determine whether ChatGPT could recover the lost information. The secondary questions were created within the same conversation thread as the corresponding primary question. Lastly, we analyzed the authenticity of the references included in the responses. The formulated questions and the responses obtained using ChatGPT, as well as their verification with the guidelines and selected consensus documents, are detailed in Tables 1–3. Leukoplakia Erythroplakia Oral submucous fibrosis Lichen planus Actinic cheilitis (Score 2) Oral leukoplakia Oral eritroplakia Proliferative verrucous leukoplakia (PVL) Oral lichen planus Oral submucous fibrosis Actinic keratosis (Actinic cheilitis) Oral lichenoid lesion (OLL) Oral graft-versus-host disease (OGVHD) Clinicians should follow-up periodically with the patient. If the lesion has not resolved and the clinical diagnosis of an OPMD cannot be ruled out, then clinicians should perform a biopsy of the lesion or refer the patient a specialist Evidence level: low Strength of recommendation: conditional The panel suggests that for adult patients with a clinically evident oral mucosal lesion considered to be suspicious of an OPMD or malignant disorder, or other symptoms, clinicians should perform a biopsy of the lesion or provide immediate referral to a specialist Evidence level: low Strength of recommendation: conditional The panel does not recommend cytologic adjuncts for the evaluation of OPMDs among adult patients with clinically evident, seemingly innocuous, or suspicious lesions Evidence level: low Strength of recommendation: conditional The panel does not recommend autofluorescence, tissue reflectance, or vital staining adjuncts for the evaluation of OPMDs among adult patients with clinically evident, seemingly innocuous, or suspicious lesions Evidence level: low to very low Strength of recommendation: conditional Clinicians should follow-up periodically with the patient. If the lesion has not resolved and the clinical diagnosis of an OPMD cannot be ruled out, then clinicians should perform a biopsy of the lesion or refer the patient a specialist The panel suggests that for adult patients with a clinically evident oral mucosal lesion considered to be suspicious of an OPMD or malignant disorder, or other symptoms, clinicians should perform a biopsy of the lesion or provide immediate referral to a specialist aEvidence level: low Strength of recommendations: conditional The panel does not recommend cytologic adjuncts for the evaluation of OPMDs among adult patients with clinically evident, seemingly innocuous, or suspicious lesions Evidence level: low Strength of recommendation: conditional Evidence level: extremely low Strength of recommendation: strongly recommended Larger size OL located on the floor of the mouth, tongue, and soft palate The longer the duration of OL, the higher the risk Dysplasia and carcinoma in situ Tobacco smoking Heavy alcohol consumption, particularly when combined with smoking Certain types of human papillomavirus (HPV) Patients with immunodeficiency, such as those with HIV/AIDS or who have undergone an organ transplant (Score 3) Lesion size >200 mm2 Non homogeneous appearance Lesions on the tongue and the floor of the mouth (commissure and gingiva–buccal region in India) Age > 40 years. Female gender Family history of cancer Presence of invasive Candida albicans Long duration of OL OL in nonsmokers Increased keratosis Verrucous, nodular, or erythematous components Induration or ulceration Evidence level: low. Strength of recommendation: strongly recommended For small lesions with no dysplasia or cancer, a conservative approach such as close monitoring and behavior modification may be recommended. In some cases, topical therapies such as retinoids, bleomycin, or photodynamic therapy may also be considered For larger or more severe lesions, surgical therapy may be recommended, such as excision, laser surgery, or cryotherapy In cases where dysplasia or cancer is present, surgery is generally recommended (Score 1) High-risk OL: surgery is the essential treatment. OL with moderate/severe epithelial dysplasia or carcinoma in situ: surgical excision with or without grafting; for larger homogeneous lesions (>4 cm) wait and watch through rigid surveillance OL with moderate/severe epithelial dysplasia: complete removal should be offered. OL with mild epithelial dysplasia: remove or follow depending on clinical appearance and location Treatment options include surgical excision, laser excision, cryotherapy and photodynamic therapy Follow-up every 3 months Evidence level: extremely low Strength of recommendation: strongly recommended Follow-up every 1–3 months Evidence level: extremely low Strength of recommendation: strongly recommended In general, ChatGPT showed a moderate base of knowledge on OPMDs, although this result could have been overestimated by the inaccuracy of the guidelines when deciding on certain aspects of these lesions that are still controversial (Stojanov & Woo, 2018). In total, 7 of the 12 (58%) responses to the primary questions and 7 of the 12 (58%) responses to the secondary questions included correct and incorrect/outdated data or were completely incorrect (scores 3 and 4). Regarding the primary questions on the classification of OPMDs, ChatGPT's responses did not include proliferative verrucous leukoplakia (PVL), oral lichenoid lesion (OLL) or oral graft-versus-host disease (OGVHD), unlike the guidelines (Warnakulasuriya et al., 2021). In contrast, ChatGPT considered that chronic hyperplastic candidosis and exophytic verrucous hyperplasia/oral verrucous hyperplasia were OPMDs. This result could be due to a significant limitation in ChatGPT: the pace of updating, given that the data employed for its training is dated from before 2022 (OpenAI). In response to the corresponding secondary questions, ChatGPT considered PVL and OLL as OPMDs; when we insisted on OGVHD, the chat admitted that "… while there is some debate on the matter, OGVHD may be considered an OPMD." This result shows that ChatGPT uses natural language processing techniques, which allow it to understand the meaning and intent of words and phrases used in a conversation, as well as the context of the discussion, generating responses that not only are relevant for the subject at hand but also follow the flow of the conversation (OpenAI). Regarding the results of the diagnosis and evaluation of OPMDs, ChatGPT insisted on the use of adjuncts to identify OPMDs or oral malignant disorders in adults with clinically evident, seemingly innocuous, or suspicious lesions, or other symptoms. This proposal conflicts with the guidelines of the American Dental Association that indicate that "The panel does not recommend cytologic adjuncts, autofluorescence, tissue reflectance, or vital staining adjuncts for the evaluation of OPMDs…," although the level of evidence of these recommendations is low and the strength of the recommendation is conditional (Lingen et al., 2017). In the results of a conversation on the malignant transformation of OPMDs, ChatGPT considered tobacco smoking, heavy alcohol consumption, human papillomavirus and immunodeficiency as factors identified as high risk for the malignant transformation of oral leukoplakia (OL). ChatGPT also did not consider invasive Candida albicans as a risk factor, made no reference to OL in non-smokers and, paradoxically, attributed greater risk to men than to women, confirming significant differences with respect to the guidelines of the Indian Oral Cancer Task Force and the Indian Dental Association (Birur et al., 2022). In terms of the management of OPMDs, we found significant differences between ChatGPT and the proposals of the Society of Oral Medicine and Chinese Stomatological Association (Chen et al., 2021; e.g., regarding the importance of palpating lesions and the frequency of follow-up appointments). All of these discrepancies, in addition to emphasizing the need to update ChatGPT's contents, demonstrate geographical and cultural differences (e.g., "Areca nut cessation is an essential treatment" in India) and possibly a selection bias in the information sources that should be investigated more deeply. When asking about the references from which ChatGPT obtained the responses, we detected a number of inaccuracies in the citations regarding the authorship, the article titles and the names of the scientific journals, including some nonexistent journals. Although ChatGPT can be a useful tool in the health setting, it has numerous potential limitations such as ethical—legal problems, risk of bias, plagiarism, lack of originality, inaccurate contents, limited knowledge, incorrect citations, cybersecurity problems and the risk of infodemia (Sallam, 2023). ChatGPT's reported limitations include topics such as interpretability, reproducibility and the management of uncertainty, which could have harmful consequences in the healthcare setting (Sallam, 2023). In particular, it has been suggested that the lack of reproducibility in ChatGPT's responses could represent a highly significant limitation in the medical practice (Holzinger et al., 2023). In any case, and given that this is a recently introduced technology, a standardized methodology for assessing its reproducibility is, to our knowledge, still unavailable. This study was based on the methodology employed in previous studies (Yeo et al., 2023). One of ChatGPT's key characteristics is its ability to interact (i.e., maintain multi-turn conversations). In this preliminary study, we considered the scenario of multi-turn conversations (primary and secondary questions). It has been suggested that the ability to maintain multi-turn conversations would allow it to, for example, provide numerous pieces of evidence, request a binary decision rather than an uncertain position and clarify aspects of the response that might be unclear (Zuccon & Koopman, 2023). As in other health settings, ChatGPT can be a valuable resource in the field of Oral Medicine, particularly in clinical decision making and the optimization of clinical work flows; however, one has to be cautiously enthusiastic because the intrinsic value of the knowledge and experience of healthcare practitioners in research and clinical practice are, for now, irreplaceable (Stokel-Walker & Van Noorden, 2023). The incorporation of ChatGPT into the field of Oral Medicine has the potential to significantly accelerate the decision-making processes for the diagnosis, treatment and care strategies for patients. In the health setting, however, it is vitally important to recognize the value of the accumulated knowledge and experience of clinicians. The expertise of health practitioners acquired through years of rigorous training, research and clinical practice cannot be underestimated. These professionals provide a deep understanding that transcends mere data and algorithms, encompassing nuanced judgment, contextual understanding and the innate ability to address complex and unpredictable situations. The current limitations of AI technology, including the likelihood of providing incorrect or misleading information, highlights the need for a collaborative approach. Accordingly, AI-inspired tools (such as ChatGPT), combined with the experience and knowledge of healthcare practitioners, have the potential to achieve more reliable and effective results for patients in the Oral Medicine setting. M. Diniz-Freitas: Conceptualization; writing – original draft; investigation; methodology. B. Rivas-Mundiña: Conceptualization; writing – review and editing; methodology. J. R. García-Iglesias: Investigation; methodology. E. García-Mato: Writing – original draft. P. Diz-Dios: Conceptualization; writing – review and editing; methodology. None. The data that support the findings of this study are available from the corresponding author upon reasonable request.

Referência(s)