Connecting the Dots
2016; Lippincott Williams & Wilkins; Volume: 134; Issue: 5 Linguagem: Inglês
10.1161/circulationaha.116.021892
ISSN1524-4539
AutoresEdward Lau, Karol E. Watson, Peipei Ping,
Tópico(s)Machine Learning in Healthcare
ResumoHomeCirculationVol. 134, No. 5Connecting the Dots Free AccessResearch ArticlePDF/EPUBAboutView PDFView EPUBSections ToolsAdd to favoritesDownload citationsTrack citationsPermissions ShareShare onFacebookTwitterLinked InMendeleyReddit Jump toFree AccessResearch ArticlePDF/EPUBConnecting the DotsFrom Big Data to Healthy Heart Edward Lau, PhD, Karol E. Watson, MD, PhD and Peipei Ping, PhD Edward LauEdward Lau From National Institutes of Health BD2K Center of Excellence in Biomedical Computing (E.L., K.E.W., P.P.) and Departments of Physiology (E.L., P.P.), Medicine/Cardiology (K.E.W., P.P.), and Bioinformatics (P.P.), University of California at Los Angeles. , Karol E. WatsonKarol E. Watson From National Institutes of Health BD2K Center of Excellence in Biomedical Computing (E.L., K.E.W., P.P.) and Departments of Physiology (E.L., P.P.), Medicine/Cardiology (K.E.W., P.P.), and Bioinformatics (P.P.), University of California at Los Angeles. and Peipei PingPeipei Ping From National Institutes of Health BD2K Center of Excellence in Biomedical Computing (E.L., K.E.W., P.P.) and Departments of Physiology (E.L., P.P.), Medicine/Cardiology (K.E.W., P.P.), and Bioinformatics (P.P.), University of California at Los Angeles. Originally published2 Aug 2016https://doi.org/10.1161/CIRCULATIONAHA.116.021892Circulation. 2016;134:362–364IntroductionRising capacity to measure extensive arrays of biological parameters has ushered in an era of biomedical big data. As massive datasets from large cohorts become the norm, the discipline of data science has emerged to tackle data-driven problems at the intersection of biomedical research and patient care. We introduce several sources of cardiovascular big data and discuss the importance of maximizing participation in data-driven knowledge production models.What Are Big Data?Every day, our world produces a staggering 2.5 quintillion (1018) bytes of data, including a steadily increasing amount of data from health care and biomedicine. The whole-genome sequence of a patient can reach 100 gigabytes (1011 bytes) in size, whereas a cardiology division may perform >1000 echocardiograms per month, totaling >200 gigabytes of data. The term biomedical big data has been coined to describe healthcare and biomedical datasets that reach remarkable scale, volume, or complexity. Four biomedical big data sources are of particular interest to cardiovascular biomedicine:Functional phenotypes: Demographics, hemodynamics, electrocardiography, echocardiograms, and imaging data are pouring in from large cohorts such as from among the ≈11 500 cardiac-related studies that are listed by clinicaltrials.gov. Popular personal fitness-tracking devices likewise have created a deluge of mobile health data (eg, heart rate, physical activities, lifestyle) awaiting exploitation. The ability to extract features from phenotypic data and to identify complex interrelationships offers tremendous potential to enhance diagnoses and to improve care.Molecular profiles: Large-scale omics data on genes, transcripts, proteins, and metabolites can now be acquired in large studies, in the clinic, or even commercially and may be integrated with functional data to allow a better understanding of disease pathogenesis. For example, the National Center for Biotechnology Information Database of Genotypes and Phenotypes alone lists >2000 molecular and functional datasets from >250 cardiovascular studies, including the Framingham Heart Study, the Jackson Heart Study, and other National Institutes of Health–sponsored cohort studies. Collection of even more omics data from cardiovascular cohorts is being promoted by funding agencies, for example, the National Heart, Lung, and Blood Institute X01 funding mechanism for "Omics Phenotypes of Heart, Lung, and Blood Disorders."Medical records: Patient electronic medical records abound with physician's notes, billing codes, laboratory test results, and other valuable information on disease, treatment, and epidemiology that may be mined for association studies and predictive modeling on prognosis and drug responses. Billing code data have been most readily analyzable because of their codified structure. Ongoing efforts to translate less structured but more information-rich physician notes for computation will open new analytic avenues.Literature knowledge: PubMed boasts a treasure trove of >2.2 million cardiovascular-related articles from 1809 to 2016, and it is estimated that there is a new publication every ≈2.7 minutes. This volume of data overwhelms the capacity of human readers to keep abreast of biomedical knowledge. Biomedical articles are written in natural languages, which are syntactically complex and do not come in predefined structures that allow them to be easily parsed by computer programs. Methods to allow biomedical corpora to be read and computed by software (ie, rendering them machine readable) will unleash tremendous power to mine knowledge on genes, diseases, and drugs.What Benefits Can Data Science Offer?Bigger data can bring richer phenotypic measurements, truer representation of populations, and more granular information on disease susceptibility and treatment responsiveness—the prerequisites for precision medicine.1 As an illustrating example, investigators from the Electronic Medical Records and Genomics network developed natural language processing algorithms to extract patient electrocardiographic features and clinical data contained in physician notes from >5000 electronic medical records across 5 participating hospitals. Informaticians then integrated the data with genomic variants in the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) cohorts to identify genetic loci controlling certain electrocardiographic variables such as QRS intervals. Subsequent phenome-wide association identified a handful of polymorphic variants that may be used by physicians to assess and predict the risks of patients developing arrhythmias.2This byte-to-bedside, data-to-knowledge paradigm is now being explored globally to enhance clinical decision making and to power biomedical discoveries. However, data alone, whether big or small, do not automatically lead to answers. Success requires the confluence of large-scale phenotypic and molecular data and the ability to computationally integrate them across cohorts. As the rate-limiting step in knowledge production shifts from data generation to interpretation, computational and analytic advances often become indispensable for extracting value from data. Take the challenge of electronic medical record extraction above: If the data volume were small, human data entry and curation specialists could be used to manually extract data from electronic medical records and put them into databases. With millions of patient records, however, it is not feasible to simply scale up the workforce needed for data entry, a predicament that has spurred informatics innovations on text-mining and crowd-sourcing alternatives. Opportunities abound for data science advances to address biomedical questions and to expand knowledge.3What Roles Do Data Commons Play?Data commons are shared virtual environments where data users can come together and interact with the building blocks of data sciences—data and tools—to complete data-driven analyses that suit their professional interests (Figure). Although implementations vary, the data science sandboxes provided by commons allow data generators to share data and data consumers to locate the data, to access the tools required to decode them, to perform analyses to address biomedical questions, and further to distribute results. A hypothetical commons may contain entry-restricted servers where cohort data can be securely deposited; operational servers where data may be selected and deidentified for consumption; online informatics platforms comprising resource registries and search engines, both of which allow consumers to locate and access digital objects, that is, datasets and tools, through direct queries (discovery indexes); and unified portal user interfaces or computing environment.Commons are important because they provide critical technical and regulatory infrastructures that promote universal participation in data-driven knowledge production.4 A corollary of the data-driven model is that the original generators of a dataset are no longer the only individuals invested (or even best equipped) in analyzing the data and drawing conclusions. Contrary to the traditional paradigm in which data are generated for the purpose of testing a specific hypothesis, data consumers today may initiate new and original investigations from preexisting data by asking new questions or by applying new analytic approaches. To ensure long-term utility and productivity, data must not only be deposited for reuse but also be made broadly accessible and discoverable by users with very different perspectives, career interests, and professional expertise.The reconceptualization of data from a single-use throwaway to a permanent, reusable resource is a paradigm shift that calls for a rethink of knowledge generation and management models. Participation from all stakeholders will ensure far-reaching implications for cardiovascular biomedicine, empowering precision medicine and realizing the benefit of big data for all.Download figureDownload PowerPointFigure. Data-driven knowledge production via commons. Data commons are shared virtual environments where users can interact with resources for data-driven knowledge production. In the workflow depicted here, data from cardiovascular cohorts are first deposited onto secure servers. After deidentification processes, they can be made available in the commons environment. Data consumers interacting with the commons can access both data of interest and necessary software tools to perform analyses and to gain new insights.Sources of FundingDrs Lau, Watson, and Ping are supported by National Institutes of Health grant award U54 number GM114833.DisclosuresNone.FootnotesThe opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.Circulation is available at http://circ.ahajournals.org.Correspondence to: Peipei Ping, 675 Charles E. Young Dr S, MRL 1-619, University of California at Los Angeles, Los Angeles, CA 90095. E-mail [email protected]References1. Collins FS, Varmus H. A new initiative on precision medicine.N Engl J Med. 2015; 372:793–795. doi: 10.1056/NEJMp1500523.CrossrefMedlineGoogle Scholar2. Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Manolio TA, Li R, Masys DR, Haines JL, Roden DM; Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) QRS Group. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk.Circulation. 2013; 127:1377–1385. doi: 10.1161/CIRCULATIONAHA.112.000604.LinkGoogle Scholar3. Deo RC. Machine learning in medicine.Circulation. 2015; 132:1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593.LinkGoogle Scholar4. Contreras JL, Reichman JH. DATA ACCESS: sharing by design: data and decentralized commons.Science. 2015; 350:1312–1314. doi: 10.1126/science.aaa7485.CrossrefMedlineGoogle Scholar Previous Back to top Next FiguresReferencesRelatedDetailsCited By Xiao H, Ali S, Zhang Z, Sarfraz M, Zhang F, Faisal M and García-Magariño I (2021) Big Data, Extracting Insights, Comprehension, and Analytics in Cardiology: An Overview, Journal of Healthcare Engineering, 10.1155/2021/6635463, 2021, (1-14), Online publication date: 30-Jan-2021. Deng W, McMullin D, Inglessis-Azuaje I, Locascio J, Palacios I, Buonanno F, Lo E and Ning M (2021) Effect of Patent Foramen Ovale Closure After Stroke on Circulatory Biomarkers, Neurology, 10.1212/WNL.0000000000012188, 97:2, (e203-e214), Online publication date: 13-Jul-2021. Chang A (2020) Clinician Cognition and Artificial Intelligence in Medicine Intelligence-Based Medicine, 10.1016/B978-0-12-823337-5.00007-X, (193-266), . Alves da Cruz M, Ricci-Vitor A, Bonini Borges G, Fernanda da Silva P, Ribeiro F and Marques Vanderlei L (2020) Acute Hemodynamic Effects of Virtual Reality–Based Therapy in Patients of Cardiovascular Rehabilitation: A Cluster Randomized Crossover Trial, Archives of Physical Medicine and Rehabilitation, 10.1016/j.apmr.2019.12.006, 101:4, (642-649), Online publication date: 1-Apr-2020. Nazir S, Nawaz Khan M, Anwar S, Adnan A, Asadi S, Shahzad S and Ali S Big Data Visualization in Cardiology—A Systematic Review and Future Directions, IEEE Access, 10.1109/ACCESS.2019.2936133, 7, (115945-115958) Nanchen D (2018) Resting heart rate: what is normal?, Heart, 10.1136/heartjnl-2017-312731, 104:13, (1048-1049), Online publication date: 1-Jul-2018. Bakir M, Jackson N, Han S, Bui A, Chang E, Liem D, Ardehali A, Ardehali R, Baas A, Press M, Cruz D, Deng M, DePasquale E, Fonarow G, Khuu T, Kwon M, Kubak B, Nsair A, Phung J, Reed E, Schaenman J, Shemin R, Zhang Q, Tseng C and Cadeiras M (2018) Clinical phenomapping and outcomes after heart transplantation, The Journal of Heart and Lung Transplantation, 10.1016/j.healun.2018.03.006, 37:8, (956-966), Online publication date: 1-Aug-2018. Bonderman D (2017) Artificial intelligence in cardiology, Wiener klinische Wochenschrift, 10.1007/s00508-017-1275-y, 129:23-24, (866-868), Online publication date: 1-Dec-2017. Lam M, Lau E, Ng D, Wang D and Ping P (2016) Cardiovascular proteomics in the era of big data: experimental and computational advances, Clinical Proteomics, 10.1186/s12014-016-9124-y, 13:1, Online publication date: 1-Dec-2016. Krishnamurthi N, Francis J, Fihn S, Meyer C, Whooley M and Fukumoto Y (2018) Leading causes of cardiovascular hospitalization in 8.45 million US veterans, PLOS ONE, 10.1371/journal.pone.0193996, 13:3, (e0193996) Raposo A, Moliterno A, Silva J, Fabri R, Freire A and Pacagnelli F (2022) Comparação da resposta hemodinâmica entre terapia convencional e realidade virtual em pacientes com insuficiência cardíaca internados na unidade de emergência, Fisioterapia e Pesquisa, 10.1590/1809-2950/21008729012022pt, 29:1, (61-67) Raposo A, Moliterno A, Silva J, Fabri R, Freire A and Pacagnelli F (2022) Comparison of hemodynamic responses between conventional and virtual reality therapies in patients with heart failure admitted to an emergency room, Fisioterapia e Pesquisa, 10.1590/1809-2950/21008729012022en, 29:1, (61-67) August 2, 2016Vol 134, Issue 5 Advertisement Article InformationMetrics © 2016 American Heart Association, Inc.https://doi.org/10.1161/CIRCULATIONAHA.116.021892PMID: 27481999 Originally publishedAugust 2, 2016 Keywordsbioinformaticsdata miningdatasetPDF download Advertisement SubjectsInformation Technology
Referência(s)