Carta Acesso aberto Revisado por pares

Performance of Three Large Language Models on Dermatology Board Examinations

2023; Elsevier BV; Volume: 144; Issue: 2 Linguagem: Inglês

10.1016/j.jid.2023.06.208

ISSN

1523-1747

Autores

Fatima N. Mirza, Rachel Lim, Sara Yumeen, Samer Wahood, Bashar Zaidat, Asghar Shah, Oliver Y. Tang, John Kawaoka, Su-Jean Seo, Christopher DiMarco, Jennie J. Muglia, Hayley Goldbach, Oliver J. Wisco, Abrar A. Qureshi, Tiffany J. Libby,

Tópico(s)

Cutaneous Melanoma Detection and Management

Resumo

As artificial intelligence (AI) continually advances, its potential role in clinical decision making has been increasingly explored. Growing attention has surrounded stress testing the readiness of AI models for clinical utility by quantifying the performance of large language models (LLMs), such as ChatGPT (OpenAI, San Francisco, CA) and its successor GPT-4 (OpenAI), on standardized medical examinations. However, LLMs have predictable limitations, such as older models struggling with higher-order questions ( Ali et al., 2023 Ali R. Tang O.Y. Connolly I.D. Fridley J.S. Shin J.H. Zadnik Sullivan P.L. et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery. 2023; (e-pub ahead of print) (accessed April 14, 2023)https://doi.org/10.1227/neu.0000000000002551 Crossref Scopus (30) Google Scholar , Ali et al., 2021 Ali R. Syed S. Sastry R.A. Abdulrazeq H. Shao B. Roye G.D. et al. Toward more accurate documentation in neurosurgical care. Neurosurg Focus. 2021; 51: E11 Crossref PubMed Scopus (3) Google Scholar ). Nevertheless, the possibilities for AI as a clinical aide are only continuing to emerge, particularly for a specialty as reliant on visual data as dermatology ( Liu et al., 2020 Liu Y. Jain A. Eng C. Way D.H. Lee K. Bui P. et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020; 26: 900-908 Crossref PubMed Scopus (327) Google Scholar ; OpenAI, 2023 OpenAIGPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdfDate: 2023 Date accessed: April 14, 2023 Google Scholar ). How Foundation Models Are Shaking the Foundation of Medical KnowledgeJournal of Investigative DermatologyVol. 144Issue 2PreviewThe recent letter to the editor by Mirza et al (2023), published in The Journal of Investigative Dermatology, showcases the successful application of publicly available chatbots (GPT-3.5, GPT-4, and Google Bard) to the dermatology board examination. Their results not only complement those of many similar studies in other medical and nonmedical disciplines but also demonstrate the rapid, ongoing advances in artificial intelligence (AI) (Kung et al, 2023). They highlight the swift development in the field, as evidenced by the finding that GPT-4 significantly outscored GPT-3.5, and illustrate the applicability of AI to the specialty of dermatology, shedding light on some of the promises and limitations of AI models. Full-Text PDF

Referência(s)