Editorial Revisado por pares

Improving Quality in Cardiothoracic Surgery: Exploiting the Untapped Potential of Machine Learning

2022; Elsevier BV; Volume: 114; Issue: 6 Linguagem: Inglês

10.1016/j.athoracsur.2022.06.058

ISSN

1552-6259

Autores

Agni Orfanoudaki, Joseph A. Dearani, David M. Shahian, Vinay Badhwar, Félix G. Fernández, Robert Habib, Michael E. Bowdish, Dimitris Bertsimas,

Tópico(s)

Advanced X-ray and CT Imaging

Resumo

"If you always do what you always did, you will always get what you always got."—Variously attributed to Albert Einstein, Henry Ford, and other innovators In its ongoing, multidecade quest to optimize patient outcomes in adult and congenital cardiothoracic surgery, The Society of Thoracic Surgeons (STS) has emphasized several key strategies, including quality measurement, feedback reports to providers, performance improvement initiatives, clinical research, and public reporting. The foundation for all these activities is the STS National Database, a gold standard clinical registry with more than 3 decades of operational experience, continuing evolution, nearly universal (≈ 97%) penetration among US adult cardiac programs, and annual program audits demonstrating 96% to 97% data accuracy.1Jacobs J.P. Shahian D.M. Grau-Sepulveda M. et al.Current penetration, completeness, and representativeness of The Society of Thoracic Surgeons Adult Cardiac Surgery Database.Ann Thorac Surg. 2022; 113: 1461-1468Abstract Full Text Full Text PDF PubMed Scopus (22) Google Scholar Building on the foundation of the STS Database, the STS Quality Program includes an industry-leading portfolio of state-of-the art quality metrics and a voluntary public reporting program that includes 84% to 90% of adult and congenital cardiac surgeons, with growing numbers in general thoracic surgery as well. In recognition of its contributions, the STS Database and Quality Program have received the John M. Eisenberg Award of The Joint Commission and the National Quality Forum for outstanding leadership in promoting quality and safety in cardiothoracic surgery.2Shahian D.M. Professional society leadership in health care quality: The Society of Thoracic Surgeons experience.Jt Comm J Qual Patient Saf. 2019; 45: 466-479Abstract Full Text Full Text PDF PubMed Scopus (11) Google Scholar Notwithstanding these many strengths, both core STS programs also face challenges if they are to continue to advance the care of adult and congenital cardiothoracic surgery patients. On the one hand, there is the need for more and higher-dimensionality variables for improved risk prediction, especially when using artificial intelligence (AI) and machine learning (ML) approaches that can identify previously unsuspected nonlinear relationships. On the other hand, many STS participants and their hospital administrators are dissatisfied with the time and resources needed to collect current variables and would not accept a further data collection burden. To continue their evolution and to exploit the full potential of the STS National Database and Quality Program, we must strive contemporaneously to increase the number and dimensionality of data variables collected, reduce manual data collection burden and costs through NLP approaches, and improve risk prediction models using AI and ML. With the availability of richer and more granular "big data," ML algorithms are poised to transform medicine and revolutionize health care delivery in the coming years. These algorithms will provide practitioners with new insights for disease diagnosis and treatment, thus forming a novel generation of clinical decision-making tools. For example, computer vision algorithms can accurately evaluate cardiac function and conduct risk stratification,3Ouyang D. He B. Ghorbani A. et al.Video-based AI for beat-to-beat assessment of cardiac function.Nature. 2020; 580: 252-256Crossref PubMed Scopus (315) Google Scholar,4Eng D. Chute C. Khandwala N. et al.Automated coronary calcium scoring using deep learning with multicenter external validation.NPJ Digit Med. 2021; 4: 88Crossref PubMed Scopus (45) Google Scholar and robotic-assisted surgery has seen tremendous growth in the past decade as a result of ML enhancements in the available hardware.5Bellini V. Valente M. Del Rio P. Bignami E. Artificial intelligence in thoracic surgery: a narrative review.J Thorac Dis. 2021; 13: 6963-6975Crossref PubMed Scopus (9) Google Scholar At the individual patient level, the integration of data-driven algorithms promises the personalization of health care, whereas at a much broader level, ML algorithms will inform public policy guidelines by incorporating lessons from the collective experience of the medical community. Cardiothoracic surgery will not be an exception to this trajectory. Automating the entry of data into national registries, combining multimodal data to develop nonlinear models for risk prediction, and identifying novel risk factors are just a few of the ways in which ML will help reshape how cardiothoracic surgery data are analyzed and applied to improve patient outcomes. ML lies at the intersection of computer science and statistics. It constitutes the scientific discipline that seeks to learn directly from data and experience by using computing algorithms. Although it had become popular in the past decade, the origins of ML lie at the end of the 18th century, when linear regression was first proposed by Carl Friedrich Gauss and was formalized a few years later by Adrien-Marie Legendre. The widely celebrated neural networks, a specific class of ML algorithms that imitates brain structure, were first proposed in 1944 by Warren McCullough and Walter Pitts. Although neural networks became a major research area for both neuroscientists and computer scientists through the 1970s and 1980s, this class of algorithms has been only modestly successful and has not demonstrated dramatic improvements in most health care applications other than imaging, potentially because of the limited number of observations and low dimensionality (eg, complex characteristics reduced to binary indicators) of available features.6Lippmann R.P. Shahian D.M. Coronary artery bypass risk prediction using neural networks.Ann Thorac Surg. 1997; 63: 1635-1643Abstract Full Text Full Text PDF PubMed Scopus (79) Google Scholar,7Shahian D.M. Lippmann R.P. Commentary: machine learning and cardiac surgery risk prediction.J Thorac Cardiovasc Surg. 2022; 163: 2090-2092Abstract Full Text Full Text PDF PubMed Scopus (2) Google Scholar If the concepts of AI and ML have been recognized for so long, what accounts for the rapidly expanding interest and applications in the past decade? Two reasons are responsible for the widespread development and adoption of this technology: computational power and data availability in electronic formats. The computational power of computer systems has improved exponentially in the past 50 years, and costs have decreased significantly. In addition, the application of graphics processing units and parallel programming to ML accelerated the training and inference of these algorithms by several orders of magnitude. This has resulted in tremendously faster processing speed and significantly lower prices for computations. Thus, using relatively inexpensive, commercially available computers, ML modelers are now able to train very complex and highly sophisticated algorithms that were not considered feasible in the past. A similar trend has been observed in the storage capacity of computer and information technology systems, thereby allowing for the creation of large and comprehensive databases that include billions of data points. Although computational power is the engine of ML algorithms, data are their fuel. Algorithms learn only from experience rather than from explicitly defined rules. The greater the available data volume of validated variables, including the number of features and instances, the better the algorithm will be in deriving meaningful and generalizable patterns. Larger data sets with more variables and greater granularity prevent the model from learning the "noise" and incorporating irrelevant information, thus avoiding the issue of overfitting (ie, failure of the model to predict accurately when applied to new data). Automating some of the data collection for the STS Database will provide more data with greater clinical detail while at the same time reducing manual data collection burden. In recent years, the amount of data generated by health care organizations has grown dramatically. Hospitals routinely produce and maintain electronic health records (EHRs), imaging reports, genomic information, claims, financial data, scheduling information, and many other forms of data in digitized form. The abundance of "big" health care data in the United States has been stimulated by the Health Information Technology for Economic and Clinical Health Act, which in 2009 began to provide financial incentives to hospitals and physician practices to install EHR systems. By consistently recording the patients' trajectory within the health care system, EHR databases have been proven a valuable asset for large-scale analysis that is naturally applicable to ML algorithms. Although they are a rich and abundant source of clinical information, EHR systems vary significantly across health care organizations, even if the same vendor manages them. Many data elements are sometimes inaccurate or missing as a function of the clinical urgency and pressure faced by clinical personnel. Furthermore, many of the data stored in EHRs, which could substantially enhance predictive modeling and clinical care, remain unstructured, recorded in various free-form notes and reports. The data curation process from these systems poses unique challenges to ML models, with data extraction and transfer issues, missing values, and clinical validation.8Bertsimas D. Wiberg H. Machine learning in oncology: methods, applications, and challenges.JCO Clin Cancer Inform. 2020; 4: 885-894Crossref PubMed Scopus (33) Google Scholar Finally, in the absence of a national EHR database, patient records remain in data silos. As a result, it has become expensive and administratively difficult to merge and analyze clinical data sets across organizations. These issues pose considerable limitations on the quality of the extracted data and prevent the scientific community from training and validating multiinstitutional ML models that cover diverse populations from distinct geographic areas. National clinical registries such as the STS National Database provide alternative, robust data sources that do not share these limitations. Historically, each record in the STS National Database and in most other similar national registries has been manually curated by specialized data managers, ensuring high quality and clinical validity for the resulting records. One registry within the STS National Database, the STS Adult Cardiac Surgery Database (ACSD), now contains nearly 8 million cardiac surgical procedure records with several hundred collected data fields per record collected from more than 1000 clinical centers.1Jacobs J.P. Shahian D.M. Grau-Sepulveda M. et al.Current penetration, completeness, and representativeness of The Society of Thoracic Surgeons Adult Cardiac Surgery Database.Ann Thorac Surg. 2022; 113: 1461-1468Abstract Full Text Full Text PDF PubMed Scopus (22) Google Scholar Thus, it can serve as a unique platform for more expansive ML applications, leading to improved models for risk prediction and new risk factor identification. However, the current structure of EHR systems and national registries requires manual compilation of patient profiles. Although this optimizes data accuracy, manual data collection remains a significant burden for many cardiothoracic surgery programs that employ dedicated teams of data managers to populate ACSD records from EHRs. Recognizing the costs and burden of manual data curation, STS has responded to participant requests and reduced the number of variables that must be recorded for each patient entry. Thus, even though more data often improve risk model performance and other registry data applications (eg, research), STS has intentionally limited the number of data elements collected to reduce the burden of data collection. This prevents the full potential of registry data and ML from being realized. Is there a way to minimize the data collection burden while at the same time collecting even more extensive data? We argue that the answer to this question lies in a specific type of ML called natural language processing (NLP). NLP can process raw text and voice into structured features, accurately extracting meaningful information from diverse and highly complex sources. NLP techniques have experienced remarkable progress in the past decade. Increasingly sophisticated models, such as Generative Pre-trained Transformer 3 (GPT-3, developed by OpenAI) and Bidirectional Encoder Representations From Transformers (BERT, developed by Google AI) have now been trained in more than a billion distinct sources of text, thereby helping to manipulate human language effectively.9Vaswani A. Shazeer N. Parmar N. et al.Attention is all you need.in: Guyon I. Von Luxburg U. Bengio S. Advances in Neural Information Processing Systems 30 (NIPS 2017). Neural Information Processing Systems Foundation, 2017Google Scholar In a collaboration between representatives from Massachusetts Institute of Technology, Oxford University, STS, and 4 major academic medical centers, we aim to combine these methodologies to create an algorithm-based system that automates the STS registry curation process, further advancing the innovative leadership of the STS National Database. The proposed ML models will directly learn features from both structured and unstructured information available in the hospital EHRs and subsequently translate them into structured STS registry data that conform to the stringent accuracy standards established by STS. The resulting ML tool could directly reduce the registry-related operational costs for database participants, improve the quality and efficiency of their registry operations, make resources available for other quality improvement activities, and establish a paradigm for other national registries. In addition, it will permit the expansion of national registries by automatically curating new variables and patient-level features that could be directly used as new input for risk prediction models and measures of quality at little or no additional cost. Once the feature extraction process is automated, inputting data into the STS registry will become more standardized, leading to more accurate and consistent data curation. These automated extraction algorithms will apply the same rules across all participating institutions to an even greater extent than is possible with manual approaches. Experience over the past several decades has demonstrated that AI and ML approaches to cardiothoracic risk prediction have improved prediction accuracy marginally,6Lippmann R.P. Shahian D.M. Coronary artery bypass risk prediction using neural networks.Ann Thorac Surg. 1997; 63: 1635-1643Abstract Full Text Full Text PDF PubMed Scopus (79) Google Scholar,7Shahian D.M. Lippmann R.P. Commentary: machine learning and cardiac surgery risk prediction.J Thorac Cardiovasc Surg. 2022; 163: 2090-2092Abstract Full Text Full Text PDF PubMed Scopus (2) Google Scholar likely because of the limited number and dimensionality of available data elements, which provide far less information for analysis than other applications where AI and ML have been highly successful, such as image analysis, where millions of pixels of data may be available. By facilitating the data curation process from the EHR and increasing the number and granularity of available STS variables, the proposed system will expand the applications of ML in cardiothoracic surgery. It will enable participating physicians and data scientists to leverage these rich data sources to improve the development and validation of risk scores for operative outcomes, such as mortality and morbidity. ML has the potential to achieve superior performance compared with traditional statistical models because it can uncover and exploit nonlinear relationships among the risk factors that cannot be derived from typical linear regression models. In applying ML algorithms, patient-level characteristics and risk factors may gain or lose significance depending on their interactions with other variables. If provided with a greater number and range of data elements through standardized, automated extraction algorithms, other AI or ML applications could then potentially develop prediction models that achieve better discrimination and calibration performance for rare end points, such as deep sternal wound infection, which in some cases are present for even less than 1% of the patient population. Initial research efforts confirm this hypothesis and demonstrate that ensemble and tree-based algorithms can lead to significant improvements in risk estimation for coronary artery bypass graft surgery and for aortic and mitral valve replacement surgery.10Mori M. Durant T.J.S. Huang C. et al.Toward dynamic risk prediction of outcomes after coronary artery bypass graft: improving risk prediction with intraoperative events using gradient boosting.Circ Cardiovasc Qual Outcomes. 2021; 14e007363Crossref Scopus (5) Google Scholar, 11Kilic A. Goyal A. Miller J.K. Gleason T.G. Dubrawksi A. Performance of a machine learning algorithm in predicting outcomes of aortic valve replacement.Ann Thorac Surg. 2021; 111: 503-510Abstract Full Text Full Text PDF PubMed Scopus (22) Google Scholar, 12Kilic A. Goyal A. Miller J.K. et al.Predictive utility of a machine learning algorithm in estimating mortality risk in cardiac surgery.Ann Thorac Surg. 2020; 109: 1811-1819Abstract Full Text Full Text PDF PubMed Scopus (57) Google Scholar, 13Orfanoudaki A. Giannoutsou A. Hashim S. Bertsimas D. Hagberg R.C. Machine learning models for mitral valve replacement: a comparative analysis with the Society of Thoracic Surgeons risk score.J Card Surg. 2022; 37: 18-28Crossref PubMed Scopus (6) Google Scholar, 14Bertsimas D. Orfanoudaki A. Weiner R.B. Personalized treatment for coronary artery disease patients: a machine learning approach.Health Care Manag Sci. 2020; 23: 482-506Crossref PubMed Scopus (24) Google Scholar ML algorithms can effectively incorporate a higher number of risk factors compared with traditional statistical tools. Interpretable methods that are based on a tree architecture can personalize the risk estimation process and explicitly characterize high- or low-risk patient trajectories.15Bertsimas D. Dunn J. Velmahos G.C. Kaafarani H.M.A. Surgical risk is not linear: derivation and validation of a novel, user-friendly, and machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) calculator.Ann Surg. 2018; 268: 574-583Crossref PubMed Scopus (165) Google Scholar Even less transparent algorithms, such as neural networks or ensemble methods, can be explained using interpretability frameworks, such as Shapley Additive Explanations (SHAP), that identify the distinct risk contribution of individual variables at the patient level.16El Shawi R. Sherif Y. Al-Mallah M. Sakr S. Interpretability in healthcare: a comparative study of local machine learning interpretability techniques.Comput Intellig. 2020; 37: 1633-1650Crossref Scopus (43) Google Scholar Therefore, ML can be used to improve the risk prediction process and individualize the treatment and care of patients before and after cardiothoracic surgery. We can leverage this technology to assess the importance of new variables and uncover novel interactions that previously were not considered significant, thereby informing the future design of clinical trials and national registries. ML algorithms provide the clinical community with the extraordinary opportunity to integrate different data types and sources of information into a single model, facilitating the identification of novel interactions. Computer vision is a particularly promising area of ML that allows the extraction of clinical information from raw images and videos, using as input computed tomographic and magnetic resonance images. Such data modalities do not share the limitation of human curation and subjective evaluation that is present in tabular and text data. In addition, computer vision algorithms have repeatedly demonstrated their edge compared with human experts, primarily because of the ability of these algorithms to process and analyze very high volumes of data. Thus, future risk stratification tools will be based on holistic patient profiles that combine images, natural language, and tabular information.17Lee G. Nho K. Kang B. Sohn K.A. Kim D. Predicting Alzheimer's disease progression using multi-modal deep learning approach.Sci Rep. 2019; 9: 1952Crossref PubMed Scopus (216) Google Scholar Such approaches will not only boost the downstream predictive power of risk models but also highlight direct interactions among different physiologic and biologic factors that were previously unrecognized. We summarize the future impact of ML on the operational workflow of cardiothoracic surgery programs participating in the STS registry in the Figure 1. Each STS participating program will be able to feed its structured and unstructured EHR (Figure 1, A) data into feature extraction models that will populate the relevant STS registry fields (Figure 1, B). Once each hospital file is compiled, data managers will review its contents, thus ensuring the high-quality standards required by STS. The resulting records will be centrally combined into the ACSD by STS. Various combinations of linear and nonlinear models on the basis of traditional and novel ML approaches will be explored and used to optimize predictive modeling, in some instances by using so-called committee classifiers (Figure 1, C). Ideally, the application of advanced algorithms not only will improve the discrimination and calibration performance of the STS risk functions but also will lead to novel medical insights, better patient outcomes, and superior quality of care for the participating institutions (Figure 1, D). Despite numerous publications that highlight the potential opportunities and benefits derived from the deployment of ML in cardiac surgery, few studies have demonstrated substantial improvements in risk model predictive accuracy, clinical practice, or cost. This partly reflects the complexity and opacity of ML that result in a lack of understanding and trust within in the cardiac surgery community. Such concerns could be remedied by applying interpretable ML approaches that provide intuitive and actionable insights, to make these systems less of a "black box." One of the greatest challenges that the ML community faces is developing models that are robust to the inaccuracies and missing information present in the underlying EHR systems. These systems form the basis of the national registries. As a result, the downstream quality of the derived models will depend on the flaws inherently present within the hospital databases. Another potential pitfall of these models is the potential bias that may lie within the prospectively curated data sets and that is subsequently amplified by the downstream algorithms. For this reason, it is essential to validate models in large populations that cover multiple centers with distinct patient demographic and socioeconomic characteristics externally, thereby ensuring that the proposed systems are generalizable. In conclusion, although ML poses several structural and technical challenges, it remains a powerful tool that can reshape and advance quality assessment, clinical research, and performance improvement and ultimately drive superior patient outcomes in cardiothoracic surgery. Fully exploiting the potential of ML will require collaborations among hospital administrators, physicians, clinical researchers, data scientists, database managers, and the ML community. Only by effectively combining both human intelligence and AI will it be possible to develop and validate models that enrich our understanding of cardiothoracic surgery and optimize and continuously improve patient outcomes. Financial support was provided by The Society of Thoracic Surgeons.

Referência(s)