Artificial Intelligence and Human Life: Five Lessons for Radiology from the 737 MAX Disasters
2020; Radiological Society of North America; Volume: 2; Issue: 2 Linguagem: Inglês
10.1148/ryai.2020190111
ISSN2638-6100
Autores Tópico(s)Medical Imaging and Analysis
ResumoHomeRadiology: Artificial IntelligenceVol. 2, No. 2 PreviousNext EditorialFree AccessArtificial Intelligence and Human Life: Five Lessons for Radiology from the 737 MAX DisastersJohn Mongan , Marc KohliJohn Mongan , Marc KohliAuthor AffiliationsFrom the Department of Radiology and Biomedical Imaging, University of California, San Francisco, 505 Parnassus Ave, San Francisco, CA 94143.Address correspondence to J.M. (e-mail: [email protected]).John Mongan Marc KohliPublished Online:Mar 18 2020https://doi.org/10.1148/ryai.2020190111MoreSectionsPDF ToolsImage ViewerAdd to favoritesCiteTrack CitationsPermissionsReprints ShareShare onFacebookTwitterLinked InEmail Recent advances in machine learning, a subset of artificial intelligence (AI), have led to a surge in efforts to automate cognitive processes in medicine, particularly in radiology. Automation through AI has the potential to benefit patients through decreased cost, increased efficiency, and reduced errors, but it also introduces new risks and dangers.Medicine often looks to the airline industry for inspiration to improve patient safety and reduce errors (1). In addition to drawing on the successes of aviation, medicine should learn from aviation's failures. This is particularly true with respect to AI and automated systems, which are currently more broadly adopted in aviation than in medicine (2). The recent tragic losses of two Boeing 737 MAX airliners provide lessons on how AI systems should and shouldn't be implemented in medicine generally and radiology specifically.Automated systems designed to improve safety may create dangers or cause harm when they malfunction. The effects of an artificially intelligent system are determined by the implementation of the system, not by the designers' intent. There is a natural tendency to assume that systems intended to improve safety will do just that, and that their worst-case failure will be the absence of the additional safety the system is supposed to provide. The Boeing 737 MAX illustrates that such is not the case. The Maneuvering Characteristics Augmentation System (MCAS) that appears to have caused both crashes was designed as a safety system to mitigate a known risk with the 737 MAX redesign. Modern high-efficiency turbofan engines are much larger in diameter than those used in the 1960s, so mounting them in the original configuration would risk dragging them on the ground. The 737 MAX redesign made the newer, larger engines fit by positioning them further forward and higher on the wing. Unfortunately, this new engine position can cause the nose of the plane to pitch up during some maneuvers (3).Pitching an airplane's nose up is dangerous because it increases the angle of attack (the angle between airflow and the direction the wing is pointed). When the angle of attack becomes too large, the wings lose lift and the plane stalls and begins to fall. To mitigate this propensity for dangerous stalls, the 737 MAX was designed with MCAS to monitor an angle-of-attack sensor. When MCAS detects a high angle of attack, it forces the nose of the plane down, reducing the angle of attack. However, if the angle-of-attack sensor malfunctions and indicates a high angle of attack during normal flight, MCAS forces the nose down into the ground, which appears to have been the cause of both crashes. In the following paragraphs, we highlight five lessons about AI that radiology should learn from the 737 MAX disasters. These lessons are also summarized in the Table.Lessons for Implementation of Artificial IntelligenceThe first lesson is that AI system failures and their downstream effects need to be considered independent of the intended purpose and proper function of the system. In particular, it should not be assumed that the worst-case failure of a system that includes AI is equivalent to the function of that system without AI. For instance, using AI for radiology worklist prioritization is generally considered low risk because the current state for most practices is effectively random prioritization. However, the worst-case failure for AI prioritization is not random, but reversed prioritization (ie, high acuity cases are ordered to the back of the list and are read last). This may seem unlikely but would happen if the algorithm and implementation mismatched on whether the highest or lowest numeric priority value represents greatest acuity. Adding AI introduces new possibilities for failure; risks of these failures must be identified and mitigated, which can be difficult prospectively.Boeing seems to have underestimated the risk of MCAS. The 737 MAX has two independent angle-of-attack sensors, but MCAS uses only one for input (4). Indicators that pilots could use to identify malfunction of the sensor are available, but Boeing made these optional, additional-cost features. Neither aircraft that was lost was equipped with these options (5).Such a failure illustrates that the output of an AI system is only as good as its inputs. The accuracy of inputs to AI systems is equally important as the AI's accuracy in interpreting those inputs. MCAS generated the correct output based on the inputs it was given; the failures causing the crashes were in the input data. The second lesson is that implementation of an AI algorithm—connecting it to inputs and outputs—requires the same level of care as development of the algorithm, and testing should cover the fully integrated system, not just the isolated algorithm. Furthermore, AI systems should use all reasonably available inputs to cross-check that input data are valid; and provide easy visualization and understanding of the inputs that the AI system receives, to facilitate detection of erroneous inputs. When an AI system detects inconsistent or potentially erroneous inputs, or uncertain or probably incorrect outputs, an alert should be clearly and reliably communicated to the people able to immediately address the issue. These should be basic, required aspects of AI systems, not options or add-ons.The third lesson is that when AI systems are added to medical workflows, the people in those workflows must be made aware of the AI and must receive training on the expected function and anticipated dysfunction of the system. People working with AI can only supervise and correct AI when they know of the AI's existence and the outputs it is producing. The flight crews of these two 737 MAX planes were not aware of the existence of MCAS on their aircraft. MCAS was largely absent from operations manuals; neither American nor European regulators required pilots to have specific training. At a meeting with Boeing after the first of the two crashes, an American Airlines pilot said, "These guys didn't even know the damn system [MCAS] was on the airplane—nor did anybody else" (6). Notification and training are important even for systems expected or intended to be transparent to users, such as AI-based image reconstruction algorithms. The importance of training is underscored by the 737 MAX incidents: at the same meeting with American Airlines, a Boeing representative justified decisions to omit documentation or training on MCAS, saying, "I don't know that understanding this system would've changed the outcome on this. In a million miles, you're going to maybe fly this airplane, maybe once you're going to see this, ever" (6). If people are uninformed about AI in their workflow, the likelihood that failure of the system will be detected decreases and the risk associated with failures increases.MCAS is a closed-loop system: the output of the automated system directly initiates an action without any human intervention. At present, most radiology AI provides triage, prioritization, or diagnostic decision support feedback to a human, but in the future closed-loop systems may be more common. Closed-loop systems have an important additional risk: they cannot be ignored and must be inactivated to avoid consequences. To mitigate this additional risk, closed-loop systems should clearly alert users when they are initiating actions, systems should accompany the alerts with a simple and rapid mechanism for disabling the system, and the system should remain disabled long enough for the failure to be addressed. This is the fourth lesson.None of these measures was present on the 737 MAX, and the pilots of the two affected planes were unable to stop MCAS from diving into the ground even after the erroneous nose-down actions were noted. There was no notification that MCAS had activated and was forcing the nose down. In the first crash, the pilot temporarily disabled MCAS 24 times, but it repeatedly reactivated. Two additional settings were required for permanent deactivation, but the flight crew had not been trained for the deactivation procedure and was unable to identify the settings in time. Subsequent simulations showed that permanent deactivation of MCAS could have saved the plane, but it would have had to be completed within 40 seconds of MCAS initial activation (7).Government regulation should safeguard against dangerous systems in life-or-death situations like airliners or health care. The U.S. Federal Aviation Administration (FAA) is tasked with certification of aviation safety. Beginning in the 1980s, faced with increasing airliner complexity and a limited budget, the FAA increasingly has delegated regulatory responsibility to manufacturers. A 2011 Office of Inspector General report concluded that 90% of regulatory compliance was delegated to a manufacturer for a new aircraft design (8). The U.S. Food and Drug Administration (FDA) recently proposed a framework for regulating AI in health care (9). In lieu of certifying individual AI applications, the FDA has proposed certifying software developers leading to streamlined application reviews called Pre-Cert. Pre-Cert is remarkably similar to the FAA's delegation model: Both processes rely on companies to place safety above profit when making design decisions. A report on Boeing's delegated regulatory review of MCAS showed that Boeing underreported the extent of control that MCAS exerted, mischaracterized MCAS failure risk, and failed to recognize the angle-of-attack sensor as a single point of failure. The report describes a profit-driven rushed process that skipped review of critical documents (10). The fifth lesson is that regulation is necessary, but may not be sufficient to protect patient safety, particularly when subject to the conflicts of interest inherent in delegated regulatory review.Multiple failures of design and implementation increased the risk associated with MCAS and contributed to an AI system designed to improve safety causing the loss of 346 human lives. In retrospect, these errors may seem obvious, but they occurred in a mature field with a strong safety culture, and similar failures could easily recur in the developing area of AI in radiology. We have the opportunity to learn from these failures now, before there is widespread clinical implementation of AI in radiology. If we miss this chance, our future patients will be needlessly at risk for harm from the same mistakes that brought down these planes.Disclosures of Conflicts of Interest: J.M. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: consultant for Siemens related to clinical decision support; institution receives grants for GE and Enlytic; author may receive potential royalties from GE (none paid as of yet); unpaid member of clinical advisory board for Nuance. Activities related to the present article: editorial board member of Radiology: Artificial Intelligence. M.K. Activities related to the present article: author receives consulting fees from Medical Sciences Consulting; Gilead, and Honor Health; Medical Sciences Consulting is an NLM-funded project service related to mobile diagnostic truck deployment including operationalization of chest radiograph classifier; for Gilead and Honor Health, author had speaking engagements with various topics including AI ethics. Author retained complete editorial control for all of these engagements; author received support for travel related to committee activities including AI topics from RSNA and SIIM. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.References1. Helmreich RL. On error management: lessons from aviation. BMJ 2000;320(7237):781–785. Crossref, Medline, Google Scholar2. Wachter R. The Digital Doctor: Automation, Aviation and Medicine. Health Care Blog. 2015. https://thehealthcareblog.com/blog/2015/02/26/automation-aviation-and-medicine-will-technology-ever-replace-pilots/. Accessed April 27, 2019. Google Scholar3. Vartabedian R. How a 50-year-old design came back to haunt Boeing with its troubled 737 Max jet. Los Angeles Times. March 15, 2019. Google Scholar4. Gallagher S. Boeing downplayed 737 MAX software risks, self-certified much of plane's safety. Ars Technica. 2019. https://arstechnica.com/information-technology/2019/03/boeing-downplayed-737-max-software-risks-self-certified-much-of-planes-safety/. Accessed May 16, 2019. Google Scholar5. Gallagher S. They didn't buy the DLC: feature that could've prevented 737 crashes was sold as an option. Ars Technica. 2019. https://arstechnica.com/information-technology/2019/03/boeing-sold-safety-feature-that-could-have-prevented-737-max-crashes-as-an-option/. Accessed May 16, 2019. Google Scholar6. Newburger E. Audio recording reveals Boeing resisted angry calls from pilots for 737 Max fix in November. CNBC. 2019. https://www.cnbc.com/2019/05/15/boeing-reportedly-resisted-pilots-angry-calls-for-737-max-fix-last-fall.html. Accessed May 16, 2019. Google Scholar7. Gallagher S. Lion Air 737 MAX crew had seconds to react, Boeing simulation finds. Ars Technica. 2019. https://arstechnica.com/information-technology/2019/03/simulations-show-lion-air-737-crew-had-little-time-to-prevent-disaster/. Accessed May 16, 2019. Google Scholar8. FAA. FAA Needs to Strengthen Its Risk Assessment and Oversight Approach for Organization Designation Authorization and Risk-Based Resource Targeting Programs. Washington, DC: FAA, 2011. Google Scholar9. FDA. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD). Silver Spring, Md: FDA, 2019. Google Scholar10. Gates D. Flawed analysis, failed oversight: How Boeing, FAA certified the suspect 737 MAX flight control system. The Seattle Times. March 21, 2019. Google ScholarArticle HistoryReceived: June 26 2019Revision requested: July 16 2019Revision received: Sept 10 2019Accepted: Sept 18 2019Published online: Mar 18 2020 FiguresReferencesRelatedDetailsCited ByMagician's Corner: 8: How to Connect an Artificial Intelligence Tool to PACSBradley J. Erickson, Felipe Kitamura, 20 January 2021 | Radiology: Artificial Intelligence, Vol. 3, No. 1Journal of the American College of Radiology, Vol. 17, No. 10Recommended Articles Overcoming Human Barriers to Safety Event Reporting in RadiologyRadioGraphics2019Volume: 39Issue: 1pp. 251-263Adapting Scientific Conferences to the Realities Imposed by COVID-19Radiology: Imaging Cancer2020Volume: 2Issue: 4Deep Learning: A Primer for RadiologistsRadioGraphics2017Volume: 37Issue: 7pp. 2113-2131New CMS Clinical Decision Support Regulations: A Potential Opportunity with Major ChallengesRadiology2017Volume: 283Issue: 1pp. 10-13Peer Feedback, Learning, and Improvement: Answering the Call of the Institute of Medicine Report on Diagnostic ErrorRadiology2016Volume: 283Issue: 1pp. 231-241See More RSNA Education Exhibits How to Use Data Augmentation to Improve Deep Learning Model PerformanceDigital Posters2019Neural Networks in Deep Learning: A Simplified Explanation for RadiologistsDigital Posters2019The On-Call Radiology Residents Guide to Managing the Reading Room: Distractions, Downtimes, and DiscussionsDigital Posters2019 RSNA Case Collection Tuberculous Leptomeningitis with Vasculitic InfarctRSNA Case Collection2021Extensive Subcutaneous EmphysemaRSNA Case Collection2021PyknodysostosisRSNA Case Collection2020 Recommended Articles Pandemic InfluenzaRadiology2007Volume: 243Issue: 3pp. 629-632Application of Failure Mode and Effect Analysis in a Radiology DepartmentRadioGraphics2011Volume: 31Issue: 1pp. 281-293Overcoming Human Barriers to Safety Event Reporting in RadiologyRadioGraphics2019Volume: 39Issue: 1pp. 251-263 Vol. 2, No. 2 Metrics Downloaded 2,959 times Altmetric Score
Referência(s)