Can ChatGPT pass the life support exams without entering the American heart association course?
2023; Elsevier BV; Volume: 185; Linguagem: Inglês
10.1016/j.resuscitation.2023.109732
ISSN1873-1570
AutoresNino Fijačko, Lucija Gosak, Gregor Štiglic, Christopher Picard, Matthew J. Douma,
Tópico(s)Cardiac Arrest and Resuscitation
ResumoChatGPT is a large language model developed by OpenAI,1Openai blog chatgpt. (Accessed 1 February 2023, at: https://openai.com/blog/chatgpt/).Google Scholar trained on a massive dataset of text from the internet. It can generate human-like responses to a variety of questions and prompts, in multiple languages and subject areas. To our knowledge, the performance of ChatGPT has not been examined in the life support and resuscitation space. In this study we tested the accuracy of ChatGPT’s answers to the American Heart Association (AHA) Basic Life Support (BLS) and Advanced Cardiovascular Life Support (ACLS) exams. We employed ChatGPT1Openai blog chatgpt. (Accessed 1 February 2023, at: https://openai.com/blog/chatgpt/).Google Scholar (OpenAI, San Francisco; version: 9 and 30 January 2023) to answer the life support exams (AHA BLS Exams A and B from February 2016, 25 questions each; and AHA ACLS Exams A and B from March 2016, 50 questions each). ChatGPT will not provide information beyond 2021; for this reason, we selected older versions of the exams. Questions based on interpretation of images were not included because ChatGPT does not support such data. For scenario-based question series we used the same session, utilizing ChatGPT’s memory retention bias but for each stand-alone question a new session was conducted.2Kung T.H. Cheatham M. Medinilla A. et al.Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models.medRxiv. 2022; : 12Google Scholar, 3Antaki F. Touma S. Milad D. et al.Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings.medRxiv. 2023; : 1Google Scholar Each answer provided by ChatGPT was compared to the exam answer key provided by the American Heart Association. The threshold for passing each exam is 84%.4Heart and Stroke Foundation of Canada. Instructor resource for resuscitation programs in Canada. (Accessed 1 February 2023, at: https://resuscitation.heartandstroke.ca/).Google Scholar Additional to overall performance, we also asked ChatGPT to estimate the “Level of correctness” (LOC) for each of the answers. In total, 96 stand-alone and 30 scenario-based questions were used for testing ChatGPT performance. ChatGPT achieved 68% (17/25) and 64% (16/25) accuracy in the 25-question AHA BLS exams and 68.4% (26/38) and 76.3% (29/38) accuracy in the two 38-question AHA ACLS exams. For each AHA ACLS exam 12 questions were removed because they required electrocardiogram interpretation. For 21.5% (25/116) of answers ChatGPT provided the reference. The AHA and American College of Cardiology (84%; 21/25) were the most commonly referenced sources. The overall LOC for all the exams was 89.5% (95% CI: 87.4–91.6%), with the BLS A exam being the highest (95% CI: 93.8%; 90.6–97%) (Fig. 1). In this study, ChatGPT did not reach the passing threshold for any of the exams. Our results were similar to the study where the United Medical Licensing Exam was used for testing ChatGPT performance.2Kung T.H. Cheatham M. Medinilla A. et al.Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models.medRxiv. 2022; : 12Google Scholar, 5Liévin V, Hother CE, Winther, O. Can large language models reason about medical questions?. arXiv preprint arXiv:2207.08143. 2023.Google Scholar We observed that in scenario-based questions ChatGPT provided not only the answer to the question, as in single-alone questions, but also insightful explanations to support the given answer. In comparison to similar artificial intelligence based systems6Alagha E.C. Helbing R.R. Evaluating the quality of voice assistants’ responses to consumer health questions about vaccines: an exploratory comparison of Alexa, Google Assistant and Siri.BMJ health & care informatics. 2019; 26100075PubMed Google Scholar, 7Miner A.S. Milstein A. Schueller S. et al.Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health.JAMA internal medicine. 2016; 176: 619-625Crossref PubMed Scopus (187) Google Scholar, 8Picard C. Smith K.E. Picard K. et al.Can Alexa, Cortana, Google Assistant and Siri save your life? A mixed-methods analysis of virtual digital assistants and their responses to first aid and basic life support queries.BMJ Innovations. 2019; 6: 1Google Scholar, the answers provided by ChatGPT were on average very relevant, accurate and showed significantly better congruence with resuscitation guidelines than previous study.8Picard C. Smith K.E. Picard K. et al.Can Alexa, Cortana, Google Assistant and Siri save your life? A mixed-methods analysis of virtual digital assistants and their responses to first aid and basic life support queries.BMJ Innovations. 2019; 6: 1Google Scholar Although, the reference provided by ChatGPT for each answer were very general the rationale provided for the answers was often significantly more detailed than the rationale provided in the ACLS exam key. In conclusion, despite the overestimated LOC, ChatGPT has shown promising results in becoming a powerful reference and self-learning tool for preparing for the life support exams. Nino Fijačko is a member of the ERC BLS Science and Education Committee and ILCOR Task Force Education Implementation and Team. Christoper Picard holds equity in Cavenwell AI (Ottawa, Ontario, Canada). Matthew Douma, Lucija Gosak and Gregor Štiglic declare that they have no conflict of interest.
Referência(s)