Artigo Acesso aberto Revisado por pares

ChatGPT Answers Common Patient Questions About Colonoscopy

2023; Elsevier BV; Volume: 165; Issue: 2 Linguagem: Inglês

10.1053/j.gastro.2023.04.033

ISSN

1528-0012

Autores

Tsung‐Chun Lee, Kyle Staller, Vlaicu Botoman, Mythili P. Pathipati, Sanskriti Varma, Braden Kuo,

Tópico(s)

Artificial Intelligence in Healthcare and Education

Resumo

See editorial on page 336. See editorial on page 336. ChatGPT (OpenAI) is a 175 billion–parameter large language model (LLM) artificial intelligence (AI) that was released in November 2022. ChatGPT is developed based on the generative pretrained transformer (GPT) 3.5 natural language processing technology and provides a conversational text response to a given prompt.1OpenAI.https://openai.com/blog/chatgptDate accessed: February 8, 2023Google Scholar One potential application of ChatGPT is answering patients' medical questions. With more than 70 million procedures annually in the United States,2Ladabaum U. et al.Gastroenterology. 2019; 157: 137-148Abstract Full Text Full Text PDF PubMed Scopus (122) Google Scholar screening colonoscopies are frequently the subject of questions in gastroenterology. In this study, we examine the quality of ChatGPT-generated answers to common questions (CQs) about colonoscopy. We retrieved 8 CQs and answers about colonoscopy from the publicly available webpages of 3 randomly selected hospitals from the top-20 list of the US News & World Report Best Hospitals for Gastroenterology and GI Surgery (Supplementary Methods). We input these questions as prompts for ChatGPT (January 30, 2023, version) twice on the same day and recorded the ChatGPT-generated answers as AI1 and AI2, respectively. We compared the text similarity among all answers using a plagiarism detection software (Supplementary Table 1). To objectively interpret the quality of ChatGPT-generated answers, 4 gastroenterologists (2 senior gastroenterologists and 2 fellows) rated 36 pairs of CQs and answers, randomly displayed, for the following quality indicators on a 7-point Likert scale: (1) ease of understanding, (2) scientific adequacy, and (3) satisfaction with the answer (Table 1) . Raters were also requested to interpret whether the answers were AI generated or not.Table 1Quality Indicators (Ease of Understanding, Scientific Adequacy, Satisfaction) for Answers From AI and From Non-AI SourcesCommon questions (CQ)Source of answers"The answers are easy to understand.""The answers are scientifically adequate.""I am satisfied with the answers."MeanPMeanPMeanPCQ1What is a colonoscopy?AI6.4.545.9.945.6.78Non-AI5.75.75.7CQ2Why is a colonoscopy performed?AI5.9.205.8.655.6.39Non-AI4.85.54.9CQ3How to prepare for a colonoscopy?AI5.9.596.1.725.81Non-AI5.85.95.6CQ4What to expect during the colonoscopy procedure?AI5.9.785.6.605.3.66Non-AI5.66.05.5CQ5What to expect after the colonoscopy procedure?AI6.3.055.9.166.1.12Non-AI5.35.15.0CQ6What to do after a negative colonoscopy result?AI6.4.445.9.286.1.77Non-AI5.86.35.8CQ7What to do after a positive colonoscopy result?AI5.0.875.4.114.9.83Non-AI4.66.04.8CQ8What to expect about complications?AI6.1.406.5.016.3.02Non-AI5.65.44.8Interpreted by four physicians with 7-points Likert Scale. (7 = Strongly agree, 4 = Neutral, 1 = Strongly disagree)Interpretation 1: "The answers are easy to understand."Interpretation 2: "The answers are scientifically adequate."Interpretation 3: "I am satisfied with the answers."Statistical analysis by Mann Whitney U test. After adjustment for multiple comparison, new Bonferroni corrected alpha value = 0.05 / 56 = 0.00089 ∗: P < .00089 as significant. Open table in a new tab Interpreted by four physicians with 7-points Likert Scale. (7 = Strongly agree, 4 = Neutral, 1 = Strongly disagree) Interpretation 1: "The answers are easy to understand." Interpretation 2: "The answers are scientifically adequate." Interpretation 3: "I am satisfied with the answers." Statistical analysis by Mann Whitney U test. After adjustment for multiple comparison, new Bonferroni corrected alpha value = 0.05 / 56 = 0.00089 ∗: P < .00089 as significant. We found that ChatGPT answers had extremely low text similarity (0%–16%) compared to answers on hospital webpages, and the text similarity ranged from 28% to 77% between the 2 ChatGPT answers, except for the CQ7 (Supplementary Table 1). Gastroenterologists rated ChatGPT answers similarly to non-AI answers in ease of understanding (AI, 5.0–6.4 vs non-AI, 4.8–5.8; all P > .00089 comparing means after Bonferroni adjustment for 56 multiple comparisons), with the AI mean scores higher than non-AI scores. Scientific adequacy scores were also similar (AI, 5.4–6.5 vs non-AI, 5.1–6.3, nonsignificant), with the AI mean score higher than non-AI 63% of the time. AI and non-AI answers received similar ratings regarding satisfaction with the answers (AI, 4.9–6.3 vs non-AI, 4.8–5.8; nonsignificant) (Table 1). The raters demonstrated only 48% accuracy in identifying AI-generated answers, with 41% sensitivity and 54% specificity. Three raters had an accuracy of less than 50%, and 1 (a fellow) had 81% accuracy (Supplementary Figure 1 and Supplementary Table 2). This study is the first of its kind, to our knowledge, to demonstrate that a contemporary LLM-derived conversational AI program is able to provide easy-to-understand, scientifically adequate, and generally satisfactory answers to CQs about colonoscopy as determined by gastroenterologists. One surprising finding was the low sensitivity in identifying AI-generated answers (sensitivity of 6%, 25%, and 44%, respectively). Heuristic feedbacks from the outperforming fellow revealed that "ChatGPT answers tended to be lengthy, used many colons (':') in the long list of possibilities it gave, and tended to be more of a list rather than a narrative paragraph in response." Contrastingly, answers from hospital webpages were "more like verbal responses to a patient as opposed to something more encyclopedic." This study suggests a potential role of conversational AI programs in optimizing the communication between patients and health care providers, especially for high-volume procedures like colonoscopy. Despite similar ratings, there was little overlap or plagiarism between the AI and non-AI answers as well as between the 2 AI answers (Supplementary Table 1 and Supplementary Table 3), which suggested the inherent plagiarism-avoiding design in LLMs and the capabilities of LLMs to create unique answers to the same question. Accumulated publications about ChatGPT in PubMed grew 10-fold from 20 on February 3, 2023, to 246 on April 14, 2023 (Supplementary Figure 2), with topics including board examinations,3Gilson A. et al.JMIR Med Educ. 2023; 9e45312Crossref PubMed Scopus (344) Google Scholar authorship, editorial policies,4Stokel-Walker C. et al.Nature. 2023; 614: 214-216Crossref PubMed Scopus (166) Google Scholar medical education,5Mbakwe A.B. et al.PLOS Digit Health. 2023; 2e0000205Crossref PubMed Google Scholar clinical decision support,6Gaumgartner C. Clin Transl Med. 2023; 13e1206Google Scholar a LLM assessment framework,7Howard A. et al.Lancet Infect Dis. 2023; 23: 405-406Abstract Full Text Full Text PDF PubMed Scopus (49) Google Scholar etc. Although early in the adoption curve,8Rogers E.M. Diffusion of innovations.5th ed. Simon and Schuster, New York2003Google Scholar LLMs (ChatGPT, BioGPT, BARD, and others) may represent a transformative innovation in how medical information (MI) is created by physicians and consumed by patients. Especially in the current era of shared decision making and the consumerization of health care, patients have been actively consuming MI through multiple channels and accessing providers through electronic patient portals at an exponential magnitude, which has the potential to benefit patients but, simultaneously, represents a heavy burden for providers and staff. We envision that AI-generated MI, with appropriate provider oversight, accreditation, and periodic surveillance, could improve the efficiency of care and free providers for more cognitively intensive patient communications. Nevertheless, potential pitfalls have to be addressed. Currently, ChatGPT-generated MI is not constructed on the basis of clinical evidence but is created through an LLM trained on diverse Internet texts with reinforcement learning by human feedback.1OpenAI.https://openai.com/blog/chatgptDate accessed: February 8, 2023Google Scholar LLM outputs may be sensitive and vulnerable to prompt engineering, that is, manipulation by subtle changes in inputting prompts, and the consistency of performance might be in "a state of constant change."9Lee P. et al.N Engl J Med. 2023; 388: 1233-1239Crossref PubMed Scopus (254) Google Scholar Thus, there remains a large gap, technology- and format-wise, regarding the use LLMs in responsible clinical care.10Sackett D.L. et al.Br J Med. 1996; 312: 71-72Crossref PubMed Google Scholar Implicit bias is another concern, because the clinical utility might differ for patients with or without resources. Furthermore, readability analyses using validated reading-level metrics (Flesch-Kincaid Grade Level, Gunning Fog Index) revealed that the AI-generated answers were written with significantly higher grade reading levels than the hospital webpages (P < .001), far exceeding the eighth grade thresholds recommended (Supplementary Table 4). This study has several limitations. First, we did not include patient raters, the group to which colonoscopy preparation answers will be ultimately provided. For this study, we aimed to initially critique AI-generated MI through the lens of medical professionals. Future research should explore responses to a broader sample of questions and clinical conditions, as well as the inclusion of patient raters. Second, numbers of both the hospital webpages and raters were small, which limited broad generalizability. Finally, webpages of randomly selected top-tier hospitals may not be comprehensive. This study shows that a conversational AI program can generate credible MI in response to common patient questions. With dedicated domain training, there is meaningful potential to optimize clinical communication to patients. Tsung-Chun Lee, MD (Conceptualization: Equal; Data curation: Equal; Formal analysis: Equal; Investigation: Equal; Methodology: Equal; Project administration: Equal; Writing – original draft: Equal; Writing – review & editing: Equal). Kyle Staller, MD (Data curation: Equal; Formal analysis: Equal; Methodology: Equal; Writing – review & editing: Equal). Vlaicu Botoman, MD (Data curation: Equal; Writing – review & editing: Equal). Mythili P. Pathipati, MD (Data curation: Equal; Writing – review & editing: Equal). Sanskriti Varma, MD (Data curation: Equal; Writing – review & editing: Equal). Braden Kuo, MD (Conceptualization: Lead; Data curation: Lead; Formal analysis: Lead; Investigation: Lead; Methodology: Lead; Project administration: Lead; Resources: Lead; Supervision: Lead; Writing – original draft: Lead; Writing – review & editing: Lead). From the top-20 list of the US News & World Report Best Hospitals for Gastroenterology and GI Surgery,1US News and World Report.https://health.usnews.com/best-hospitals/rankings/gastroenterology-and-gi-surgeryDate accessed: February 8, 2023Google Scholar we randomly selected 1 university-affiliated teaching medical center located on the East Coast, 1 in the Midwest, and 1 on the West Coast. We retrieved 8 common questions (CQs) and answers about colonoscopy from publicly available webpages of these 3 randomly selected hospitals. These 8 CQs about colonoscopy were frequently asked questions by patients. We used these questions as prompts for ChatGPT (January 30, 2023, version) and recorded its answers. To compare the consistency of artificial intelligence (AI) answers, we entered the same prompt 2 times on the same day and recorded the answers as AI1 and AI2, respectively. We compared the text similarity of answers between AI1 and AI2 as well as between AI1 and hospital webpages (non-AI) using plagiarism detection software,2Copyleaks.https://copyleaks.comDate accessed: February 15, 2023Google Scholar and the results were shown in Supplementary Table 1. To objectively interpret the quality of ChatGPT-generated answers, 4 gastroenterologists (2 senior gastroenterologists, 2 fellows) rated the answers. All 4 raters were blind to the sources of answers. For a total of 36 pairs of CQs and answers displayed in a random fashion, raters interpreted the following quality indicators on a 7-point Likert scale: (1) ease of understanding, (2) scientific adequacy, and (3) satisfaction with the answer (Table 1). Raters were also requested to interpret whether the answers were AI generated or not. Raters' performances in identifying AI-generated answers are shown in Supplementary Figure 1 and Supplementary Table 2. Example answers from AI and non-AI sources to CQ1, "What is a colonoscopy?" and to CQ2, "Why is a colonoscopy performed?" are exhibited in Supplementary Table 3. Medical information provided to patients is recommended to be readable such that an eighth grader could understand it.3Murphy B. et al.Surgeon. 2022; 20: E366-E370Crossref PubMed Scopus (7) Google Scholar We measured the reading levels of all answers to CQs by 2 objective indexes of reading level of texts: Flesch-Kincaid Grade Level4Kincaid JP, et al. http://stars.library.ucf.edu/istlibrary/56. Accessed April 12, 2023.Google Scholar and Gunning Fog Index5Avra T.D. et al.J Vasc Surg. 2022; 76: 1728-1732Abstract Full Text Full Text PDF PubMed Scopus (4) Google Scholar (Supplementary Table 4). Both indexes are well-recognized objective measures, in which index number x represents the corresponding xth grade reading level.3Murphy B. et al.Surgeon. 2022; 20: E366-E370Crossref PubMed Scopus (7) Google Scholar Medical information given to patients ideally should have an index of 8 or less. Measurements were performed by an online readability tool (https://readable.com, accessed on April 12, 2023). We searched the PubMed database with the keyword of "ChatGPT" and obtained the list of publications that involved ChatGPT.6National Library of Medicine.https://pubmed.ncbi.nlm.nih.govDate accessed: April 12, 2023Google Scholar The chronology of ChatGPT publications in PubMed is shown in Supplementary Figure 2. Data are shown as mean or mean (standard deviation). Comparison of quality indicators on answers from AI vs non-AI sources was performed using the Mann-Whitney U test. After adjustment for multiple comparisons, the new Bonferroni-corrected α value was calculated as the following: 0.05 divided by 56 comparisons, that is, 0.00089. Therefore, P < .00089 was regarded as significant in the comparison of the 3 quality indicators among answers from AI and non-AI sources. Each rater's performance in detecting AI-generated answers was collectively calculated and expressed as the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. In reading level analyses, we compared the reading levels of the answers from the individual sources with an eighth grade reading level in 2-sided Student t tests and also compared the reading levels of answers from AI and non-AI sources using Mann-Whitney U tests. Statistical analyses were performed using SAS version 14 (SAS Institute).Supplementary Figure 2Numbers of accumulated publications related to ChatGPT in the PubMed database (accessed on April 14, 2023).View Large Image Figure ViewerDownload Hi-res image Download (PPT)Supplementary Table 1Comparison of Answers to 8 Common Questions About ColonoscopyCQsAI1 vs AI2AI1 vs Hospital1AI1 vs Hospital2AI3 vs Hospital3WordsMatch, %WordsMatch, %WordsMatch, %WordsMatch, %CQ1: What is a colonoscopy?59 vs 1362859 vs 112059 vs 89059 vs 2470CQ2: Why is a colonoscopy performed?133 vs 21536133 vs 1290133 vs 16116133 vs 1670CQ3: How to prepare for a colonoscopy?223 vs 21937223 vs 1680223 vs 2723223 vs 2990CQ4: What to expect during the colonoscopy procedure?201 vs 25739201 vs 1320201 vs 2050201 vs 3440CQ5: What to expect after the colonoscopy procedure?178 vs 23860178 vs 410178 vs 1563178 vs 2020CQ6: What to do after a negative colonoscopy result?198 vs 7844198 vs NANA198 vs 1610198 vs NANACQ7: What to do after a positive colonoscopy result?189 vs 1440189 vs 720189 vs 2590189 vs NANACQ8: What to expect about complications?219 vs 16577219 vs NANA219 vs 730219 vs 820NOTE. Answers designated as AI1 and AI2 were obtained with the 8 CQs as prompts to ChatGPT on the same day. Answers designated Hospital1–3 were retrieved from webpages of 3 top-tier gastroenterology hospitals in the United States. Text similarity of answers was compared with CopyLeaks2Copyleaks.https://copyleaks.comDate accessed: February 15, 2023Google Scholar and is presented as the match percentage. AI answers shared 28%–77% similarity, except the CQ7. AI answers had extremely low text similarity (0%–16%) to those from the hospital webpages.NA, not applicable. Open table in a new tab Supplementary Table 2Performance of 4 Interpreters in Detecting Answers Generated by AIDetecting answers generated by AISensitivity, %Specificity, %PPV, %NPV, %Accuracy, %All interpreters (n = 4)40.6353.7541.2753.0947.92Senior gastroenterologists (n = 2)15.6357.5022.7346.0038.89Fellows (n = 2)65.6350.0051.2264.5256.94Senior16.2555.0010.0042.3133.33Senior225.0060.0033.3350.0044.44Fellow143.7525.0031.8235.7133.33Fellow287.5075.0073.6888.2480.56 Open table in a new tab Supplementary Table 3Example Answers From AI and Non-AI (Hospital Webpages)CQsAI1 answerAI2 answerHospital1 answerHospital2 answerHospital3 answerCQ1: What is a colonoscopy?A colonoscopy is a medical procedure in which a doctor uses a long, flexible, lighted tube called a colonoscope to examine the inside of the rectum and colon (large intestine) for any abnormalities such as polyps, tumors, or signs of colorectal cancer. The procedure is performed while the patient is sedated and typically lasts 30 minutes to an hour.A colonoscopy is a diagnostic procedure that allows a doctor to examine the inside of the large intestine (colon) and rectum. The procedure is performed using a thin, flexible tube called a colonoscope, which is equipped with a light and a camera. The doctor inserts the colonoscope through the anus and advances it into the colon to visualize the inside of the intestine. The procedure allows the doctor to identify any abnormal growths, such as polyps or cancers, as well as to assess the overall health of the colon. A colonoscopy is an important screening tool for colon cancer and can help detect the disease in its early stages, when it is most treatable. The procedure typically takes 30 minutes to an hour to complete and is performed while the patient is sedated to minimize discomfort.A colonoscopy is a safe, private and usually painless procedure that takes about 30 minutes to complete. A colonoscopy may be done in a hospital outpatient department, clinic or in a doctor's office.During the procedure, a general surgeon or gastroenterologist (a doctor who specializes in identifying and treating gastrointestinal diseases) uses a colonoscope to see the entire colon. A colonoscope is a long, flexible tube. If a polyp is found, the doctor may remove it during your colonoscopy.During your colonoscopy, the doctor may use the colonoscope to take a biopsy – a small tissue sample – if anything looks abnormal. The tissue sample will be sent to the lab for further testing.A colonoscopy (koe-lun-OS-kuh-pee) is an exam used to look for changes — such as swollen, irritated tissues, polyps or cancer — in the large intestine (colon) and rectum.During a colonoscopy, a long, flexible tube (colonoscope) is inserted into the rectum. A tiny video camera at the tip of the tube allows the doctor to view the inside of the entire colon.If necessary, polyps or other types of abnormal tissue can be removed through the scope during a colonoscopy. Tissue samples (biopsies) can be taken during a colonoscopy as well.Colonoscopy is a procedure that lets your healthcare provider check the inside of your entire large intestine or colon.The procedure is done using a long, flexible tube (colonoscope). The tube has a light and tiny camera on one end. It is put in your rectum and moved into your colon. In addition to letting your provider see the inside of your colon, the tube can be used to:Clean the lining of your colon with a water jet (irrigate)Remove any liquid stool with a suction deviceInject air in your bowel to make it easier to see inside Work inside your bowel with surgical toolsDuring a colonoscopy, your provider may remove tissue or abnormal growths (polyps) for further exam. He or she may also be able to treat problems that are found.The large intestine or colon is the last part of your digestive system. It absorbs water to change waste from liquid to solid stool. The large intestine is about 5 feet long in adults. It has 4 sections:Ascending colon. This extends upward on the right side of your belly.Transverse colon. This extends from the ascending colon across your body to the left side.Descending colon. This extends from the transverse colon downward on your left side.Sigmoid colon. This is named because of its S-shape. It extends from the descending colon to your rectum.The rectum joins the anus. This is the opening where stool passes out of your body.CQ2: Why is a colonoscopy performed?A colonoscopy is performed for several reasons, including:Screening for colorectal cancer: A colonoscopy is an effective way to screen for colorectal cancer and is often recommended for people over the age of 50 or for those with a family history of the disease.Investigation of digestive symptoms: If a person is experiencing symptoms such as abdominal pain, rectal bleeding, or changes in bowel habits, a colonoscopy may be performed to identify the underlying cause.Follow-up after abnormal results: If a person has had an abnormal result on a stool test or X-ray, a colonoscopy may be performed for further evaluation.Monitoring of pre-existing conditions: For individuals with conditions such as inflammatory bowel disease (IBD), a colonoscopy may be performed regularly to monitor the progression of the disease and check for any complications.A colonoscopy is performed to screen for and diagnose conditions affecting the large intestine (colon) and rectum. Some common reasons for having a colonoscopy include:Screening for colon cancer: A colonoscopy is an important screening tool for colon cancer, which is one of the leading causes of cancer deaths in the United States. The procedure can help detect the disease in its early stages, when it is most treatable.Evaluating symptoms: A colonoscopy may be performed to evaluate symptoms such as abdominal pain, diarrhea, constipation, rectal bleeding, and unexplained weight loss.Monitoring polyps: If a polyp is found during a colonoscopy, the doctor may remove it during the procedure. Regular follow-up colonoscopies may be recommended to monitor for new polyps or to check for the return of removed polyps.Inflammatory bowel disease: A colonoscopy can help diagnose and monitor conditions such as Crohn's disease and ulcerative colitis.Irritable bowel syndrome: A colonoscopy may be performed to rule out other causes of symptoms and to help diagnose irritable bowel syndrome.Other conditions: A colonoscopy may also be performed to evaluate other conditions affecting the large intestine, such as diverticulitis, colitis, and strictures.It's important to discuss the reasons for having a colonoscopy with your doctor to determine the best course of action for your individual circumstances.Physicians use a colonoscopy, the primary and most effective screening technique, to screen patients for colorectal cancer.A colonoscopy is a safe, private and usually painless procedure that takes about 30 minutes to complete. A colonoscopy may be done in a hospital outpatient department, clinic or in a doctor's office.During the procedure, a general surgeon or gastroenterologist (a doctor who specializes in identifying and treating gastrointestinal diseases) uses a colonoscope to see the entire colon. A colonoscope is a long, flexible tube. If a polyp is found, the doctor may remove it during your colonoscopy.During your colonoscopy, the doctor may use the colonoscope to take a biopsy – a small tissue sample – if anything looks abnormal. The tissue sample will be sent to the lab for further testing.Your doctor may recommend a colonoscopy to:Investigate intestinal signs and symptoms. A colonoscopy can help your doctor explore possible causes of abdominal pain, rectal bleeding, chronic diarrhea and other intestinal problems.Screen for colon cancer. If you're age 45 or older and at average risk of colon cancer — you have no colon cancer risk factors other than age — your doctor may recommend a colonoscopy every 10 years. If you have other risk factors, your doctor may recommend a screen sooner. Colonoscopy is one of a few options for colon cancer screening. Talk with your doctor about the best options for you.Look for more polyps. If you have had polyps before, your doctor may recommend a follow-up colonoscopy to look for and remove any additional polyps. This is done to reduce your risk of colon cancer.Treat an issue. Sometimes, a colonoscopy may be done for treatment purposes, such as placing a stent or removing an object in your colon.Colonoscopy can help your provider look for problems in your colon. These include:Any early signs of cancerRed or swollen (inflamed) tissueOpen sores (ulcers)BleedingColonoscopy is also used to screen for colorectal cancer. Screening means looking for cancer in people who don't have any symptoms of the disease. A colonoscopy may be used to check and if needed treat things such as:Colon polypsTumorsUlcerationRedness or swelling (inflammation) Pouches (diverticula) along the colon wallNarrowed areas (strictures) of the colonAny objects that might be in the colonIt may also be used to find the cause of unexplained, long-term (chronic) diarrhea or bleeding in the GI (gastrointestinal) tract. It can also be used to check the colon after cancer treatment.Colonoscopy may be used when other tests (such as a barium enema, CT colography, tests for blood in stool, stool DNA tests, or sigmoidoscopy) show the need for more testing.Your healthcare provider may have other reasons to advise a colonoscopy. Open table in a new tab Supplementary Table 4Comparison of Reading Levels of Answers From AI and Non-AI (Hospital Webpages)Source of answersWordsFlesch-Kincaid Grade LevelP valueGunning Fog IndexP valueVersus eighth grade reading levelaStatistical analyses between the reading levels of the answers and the eighth grade reading level were done with 2-sided Student t tests.Versus eighth grade reading levelaStatistical analyses between the reading levels of the answers and the eighth grade reading level were done with 2-sided Student t tests.AI175.1 (58.2)13.1 (2.2)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.15.7 (2.2)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.Non-AI168.6 (81.6)9.0 (2.1).0211.4 (2.3)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.AI vs non-AIcStatistical analyses between the reading levels of answers from AI vs non-AI sources were done with Mann-Whitney U tests.AI vs non-AIcStatistical analyses between the reading levels of answers from AI vs non-AI sources were done with Mann-Whitney U tests.<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.Versus eighth grade reading levelaStatistical analyses between the reading levels of the answers and the eighth grade reading level were done with 2-sided Student t tests.Versus eighth grade reading levelaStatistical analyses between the reading levels of the answers and the eighth grade reading level were done with 2-sided Student t tests.AI1168.9 (57.7)13.3 (2.8)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.16.1 (3.0)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.AI2181.4 (61.9)12.9 (1.5)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.15.4 (1.2)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.Hospital1108.8 (45.4)10.0 (2.0)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.12.8 (2.0)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.Hospital2169.4 (68.2)9.9 (1.4).001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.11.8 (1.5)<.001bBonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.Hospital3227.3 (91.2)7.0 (1.5).069.6 (2.5).08NOTE. Data are shown as mean (standard deviation). Flesch-Kincaid Grade Level and Gunning Fog Index are well-recognized objective measures of the reading levels of texts, in which the number x represents the corresponding xth grade reading level.3Murphy B. et al.Surgeon. 2022; 20: E366-E370Crossref PubMed Scopus (7) Google Scholar, 4Kincaid JP, et al. http://stars.library.ucf.edu/istlibrary/56. Accessed April 12, 2023.Google Scholar, 5Avra T.D. et al.J Vasc Surg. 2022; 76: 1728-1732Abstract Full Text Full Text PDF PubMed Scopus (4) Google Scholar Measurements were performed with an online readability tool (https://readable.com, accessed on April 12, 2023).a Statistical analyses between the reading levels of the answers and the eighth grade reading level were done with 2-sided Student t tests.b Bonferroni-corrected α for multiple comparisons = 0.05/16 comparisons = 0.0031; P < .0031 as significant.c Statistical analyses between the reading levels of answers from AI vs non-AI sources were done with Mann-Whitney U tests. Open table in a new tab NOTE. Answers designated as AI1 and AI2 were obtained with the 8 CQs as prompts to ChatGPT on the same day. Answers designated Hospital1–3 were retrieved from webpages of 3 top-tier gastroenterology hospitals in the United States. Text similarity of answers was compared with CopyLeaks2Copyleaks.https://copyleaks.comDate accessed: February 15, 2023Google Scholar and is presented as the match percentage. AI answers shared 28%–77% similarity, except the CQ7. AI answers had extremely low text similarity (0%–16%) to those from the hospital webpages. NA, not applicable. NOTE. Data are shown as mean (standard deviation). Flesch-Kincaid Grade Level and Gunning Fog Index are well-recognized objective measures of the reading levels of texts, in which the number x represents the corresponding xth grade reading level.3Murphy B. et al.Surgeon. 2022; 20: E366-E370Crossref PubMed Scopus (7) Google Scholar, 4Kincaid JP, et al. http://stars.library.ucf.edu/istlibrary/56. Accessed April 12, 2023.Google Scholar, 5Avra T.D. et al.J Vasc Surg. 2022; 76: 1728-1732Abstract Full Text Full Text PDF PubMed Scopus (4) Google Scholar Measurements were performed with an online readability tool (https://readable.com, accessed on April 12, 2023). Evolving Landscape of Large Language Models: An Evaluation of ChatGPT and Bard in Answering Patient Queries on ColonoscopyGastroenterologyVol. 166Issue 1PreviewThe popularity and implementation of artificial intelligence (AI)-enabled large language models (LLMs) powering chatbots are promising in health care, especially for patient queries and communication.1 Lee et al2 evaluated the performance of ChatGPT (version January 2023) in answering 8 common patient questions related to colonoscopy and compared it with responses available on hospital webpages. The study concluded that the ChatGPT answers were similar to non-AI answers in ease of understanding and scientific adequacy. Full-Text PDF Comment on "ChatGPT Answers Common Patient Questions About Colonoscopy"GastroenterologyVol. 166Issue 1PreviewI read with great interest the recent article in Gastroenterology by Lee et al1 that aimed to evaluate the ability of ChatGPT to provide satisfactory answers to common patient questions about colonoscopy. As artificial intelligence systems like ChatGPT become more prevalent in health care, it is crucial that we critically evaluate their capabilities and limitations. The authors' findings that ChatGPT can provide responses comparable with traditional medical resources in understandability, adequacy, and satisfaction ratings are important initial validations. Full-Text PDF ChatGPT and Patient Questions About Colonoscopy: CommentGastroenterologyVol. 166Issue 1PreviewIn their article "ChatGPT Answers Common Patient Questions About Colonoscopy,"1 Lee et al examined the effectiveness of ChatGPT-generated responses to frequently asked questions about colonoscopy. According to the authors, conversational artificial intelligence (AI) software can produce reliable medical information in response to typical patient inquiries. With focused domain training, they noted, clinical communication with patients could be significantly improved. Full-Text PDF Beyond Clinical Accuracy: Considerations for the Use of Generative Artificial Intelligence Models in Gastrointestinal CareGastroenterologyVol. 165Issue 2PreviewAs the volume and complexity of health care data continue to grow, the field of gastroenterology has embraced the use of computational tools to identify, extract, and synthesize relevant information.1 Rapidly expanding from the use of medical record data only, machine learning and artificial intelligence (AI) techniques now routinely integrate data from procedural images and free-text documents (eg, clinical notes, academic articles, and online resources).2,3 Clinically, this has manifested in predictive models and risk stratification tools to improve prognosis, diagnosis, treatment, and patient management. Full-Text PDF

Referência(s)
Altmetric
PlumX