Interactive Browser-Based Genomics Data Visualization Tools for Translational and Clinical Laboratory Applications
2019; Elsevier BV; Volume: 21; Issue: 6 Linguagem: Inglês
10.1016/j.jmoldx.2019.06.005
ISSN1943-7811
AutoresThomas M. Pearce, Marina N. Nikiforova, Somak Roy,
Tópico(s)Gene expression and cancer classification
ResumoVisualization-driven data exploration is a highly effective modality for interpreting and discovering insights from high-throughput genomics data sets; however, it is vastly underutilized in routine workflows in clinical and translation settings. We have developed three open-source, browser-based, interactive genomics data visualization widgets that can be used as intuitive stand-alone applications or integrated with existing web-based laboratory information solutions. The widgets were developed in JavaScript using the D3.js library. These widgets run in any modern web browser across desktop and mobile devices for easy accessibility but are designed for client-side data processing to address data privacy concerns. jsProteinMapper plots the location of a variant of interest relative to the protein domains and multiple variant databases, assisting with clinical interpretation of sequence variants. jsComut generates a highly interactive and customizable comutation plot for visual exploration of genomic data sets with clinicopathologic annotations to reveal unique molecular profiles and clinical correlates. jsCodonWheel is an interactive version of the ubiquitous circular codon–to–amino acid translation table, which lets users quickly map nucleotide changes onto resulting amino acid differences. These open-source visualization tools may improve some of the key laboratory workflows that involve the review of large-scale genomics data sets in a high-volume setting. The intuitive and responsive user interface, highly customizable visualizations, and easy integration with existing web-based laboratory software are significant highlights of these tools. Visualization-driven data exploration is a highly effective modality for interpreting and discovering insights from high-throughput genomics data sets; however, it is vastly underutilized in routine workflows in clinical and translation settings. We have developed three open-source, browser-based, interactive genomics data visualization widgets that can be used as intuitive stand-alone applications or integrated with existing web-based laboratory information solutions. The widgets were developed in JavaScript using the D3.js library. These widgets run in any modern web browser across desktop and mobile devices for easy accessibility but are designed for client-side data processing to address data privacy concerns. jsProteinMapper plots the location of a variant of interest relative to the protein domains and multiple variant databases, assisting with clinical interpretation of sequence variants. jsComut generates a highly interactive and customizable comutation plot for visual exploration of genomic data sets with clinicopathologic annotations to reveal unique molecular profiles and clinical correlates. jsCodonWheel is an interactive version of the ubiquitous circular codon–to–amino acid translation table, which lets users quickly map nucleotide changes onto resulting amino acid differences. These open-source visualization tools may improve some of the key laboratory workflows that involve the review of large-scale genomics data sets in a high-volume setting. The intuitive and responsive user interface, highly customizable visualizations, and easy integration with existing web-based laboratory software are significant highlights of these tools. Interpreting complex molecular test results to provide clinical insight is a critical service provided by molecular pathology professionals in the health care setting. With the advent of next-generation sequencing, the quantity and complexity of the genomic data that clinical laboratories are generating are ever increasing. Correspondingly, software tools are increasingly important for an efficient workflow, as they can enable quick cross-reference to relevant information from external databases, provide a linkage with other clinical data in the electronic medical record, streamline and automate routine tasks, and more.1Roy S. LaFramboise W.A. Nikiforov Y.E. Nikiforova M.N. Routbort M.J. Pfeifer J. Nagarajan R. Carter A.B. Pantanowitz L. Next-generation sequencing informatics: challenges and strategies for implementation in a clinical environment.Arch Pathol Lab Med. 2016; 140: 958-975Crossref PubMed Scopus (43) Google Scholar Data visualization software is particularly well suited to improving the interpretative process in the clinical molecular laboratory. In the context of high-dimensional genomic data sets, paired with rich clinical and phenotypic information, graphical visualizations can be a powerful way to summarize test results to enable efficient exploration of the data that ideally lead the user to unravel meaningful insights faster than traditional analytic approaches.2Gao J. Aksoy B.A. Dogrusoz U. Dresdner G. Gross B. Sumer S.O. Sun Y. Jacobsen A. Sinha R. Larsson E. Cerami E. Sander C. Schultz N. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.Sci Signal. 2013; 6: pl1Crossref PubMed Scopus (7394) Google Scholar, 3Schroeder M.P. Gonzalez-Perez A. Lopez-Bigas N. Visualizing multidimensional cancer genomics data.Genome Med. 2013; 5: 9Crossref PubMed Scopus (69) Google Scholar Software for visualizing medical data, including genomic data, comes in many flavors. Many vendors provide applications intended specifically for use with the equipment purchased by the laboratory or hospital system. Electronic medical records likewise often provide at least rudimentary visualization tools, such as charts or graphical views of test results. For molecular pathology, however, the existing solutions are often far less than ideal. This is, in part, attributable to the rapid expansion of next-generation sequencing–based assays for clinical use and, correspondingly, the rapid evolution of the types of data requiring interpretation; visualization tools designed specifically for next-generation sequencing data are still evolving. Fortunately, an expanding ecosystem of open-source software can provide a framework for the development of novel analytic and interpretive tools for use in the molecular pathology laboratory setting. For a variety of reasons, including powerful modern web browsers, a rich open-source community supporting data visualization, and a wide array of web-based genomic databases and tools, web technology is at the center of this ecosystem. Herein, we describe three novel, open-source, browser-based interactive tools for interpreting molecular pathology data. These tools showcase the utility of the browser for data visualization, demonstrate use cases for web technology that do not compromise data security, and have proved useful in both clinical and research settings at the authors' institution. The first application, jsProteinMapper, displays genomic variants in relation to protein domain structure and configurable graphs of previously reported variants for context. The second, jsComut, is an interactive comutation plot, useful for exploring cohort data and generating publication-quality scalable vector graphic (SVG) files. The third, jsCodonWheel, is an interactive mapping of codon sequences into amino acids, which streamlines certain clinical workflow elements while providing side-by-side comparative information about the coded amino acids. Web pages with demonstrations of these widgets are publicly hosted using Github Pages (https://pearcetm.github.io, last accessed April 15, 2019). The source code for all three widgets is available in a public Github repository (https://github.com/pearcetm, last accessed April 15, 2019) under the Apache license version 2 (https://www.apache.org/licenses/LICENSE-2.0, last accessed April 15, 2019). The tools described herein are based on the elements of modern web technology, including HTML5, CSS3, and JavaScript; and they operate within a web browser. The source files for each application operate equivalently whether they are hosted on a server and accessed over the web or downloaded to a local file system and loaded into the browser using the file location as the address. JavaScript libraries and open-source frameworks used for building the widgets are summarized in Table 1. Additional attributions are provided in comments in the source code when code snippets were obtained for specific purposes.Table 1JavaScript Libraries and Framework Dependencies for jsComut, jsProteinmapper, and jsCodonwheelJavaScript libraryURLDescriptionjQueryhttps://jquery.com version 3.1+ (last accessed August 5, 2018)Primarily used for accessing and manipulating the document object model, event handling, and helper functions.D3.jshttps://d3js.org version 4.0+ (last accessed August 5, 2018)Data-driven documents. Primarily used for binding data to the document and generating graphical visualization elements using SVG elements.d3-legendhttps://github.com/susielu/d3-legend (last accessed August 5, 2018)Used for generating color-based legends for the jsComut widget. Licensed for open-source use under the Apache license version 2.0.d3-tiphttps://github.com/Caged/d3-tip (last accessed August 5, 2018)Used for generating tooltips for SVG elements for the jsComut and jsProteinMapper widgets. Licensed for open-source use under the MIT license.jscolorhttp://jscolor.com (last accessed August 5, 2018)JavaScript color picker. Used for providing a color selection for SVG elements for the jsComut widget. Licensed for open-source use under the GNU GPL license version 3.GNU, GNU's Not Unix; GPL, General Public License; MIT, Massachusetts Institute of Technology; SVG, scalable vector graphic. Open table in a new tab GNU, GNU's Not Unix; GPL, General Public License; MIT, Massachusetts Institute of Technology; SVG, scalable vector graphic. The widgets were developed and tested using Google Chrome version 43+, Mozilla Firefox version 38+, and Internet Explorer version 11+. Normalization of cross-browser implementation differences in the underlying JavaScript engines was primarily handled by using the JavaScript frameworks jQuery and d3.js. Additional polyfills were added on an as-needed basis and can be found in the source code. The source code files for each widget are organized in a similar manner. The primary JavaScript code is contained in a single JavaScript (.js) file. The functionality of each widget is encapsulated in a JavaScript object that exposes an application programming interface (API) while hiding internal implementation details, thereby avoiding pollution of the browser's global namespace and enhancing integration into other web applications. Utility libraries are included in an additional JavaScript file. Overridable visual styles are defined in Cascading Style Sheet (.css) files, allowing for easy customization and extension. Demonstrations of each widget are provided via an HTML file and a small JavaScript snippet that together show how to use the widget. During the design and implementation phases, regular feedback was solicited from end users. Rigorous testing, including by nontechnical clinical users, was performed during the development of the applications to understand the requirements for optimal user experience and common use cases for exploring richly annotated genomics data sets. Features were iteratively incorporated and improved in response to user feedback and suggestions throughout the development process. To test and validate the interface for the widgets, publicly available data sets from the following sources were used: mutations and their frequency across previously reported tumors in BRAF, PIK3CA, and TP53 genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) database version 84 (used with permission from COSMIC) for jsProteinmapper, GlioSeq validation data set from Nikiforova et al,4Nikiforova M.N. Wald A.I. Melan M.A. Roy S. Zhong S. Hamilton R.L. Lieberman F.S. Drappatz J. Amankulor N.M. Pollack I.F. Nikiforov Y.E. Horbinski C. Targeted next-generation sequencing panel (GlioSeq) provides comprehensive genetic profiling of central nervous system tumors.Neuro Oncol. 2016; 18: 379-387Crossref PubMed Scopus (77) Google Scholar and The Cancer Genome Atlas urothelial carcinoma data5Robertson A.G. Kim J. Al-Ahmadie H. Bellmunt J. Guo G. Cherniack A.D. et al.Comprehensive molecular characterization of muscle-invasive bladder cancer.Cell. 2018; 174: 1033Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar for jsComut. Three web applications consisting of interactive visualization tools for interpreting genomic data were developed. Below, we describe how to use these tools from the perspective of both developers and end users and provide visual examples of the visualization widgets in action. These descriptions are not intended to be exhaustive use manuals, either for developers or users; rather, the goal is to provide an overview of the applications and to direct interested readers to the relevant interactive online demonstration pages and open-source code repositories. The software described below is provided as is and without any warranty, and the user assumes all associated risks. The software does not constitute medical advice and should not supplant appropriate medical judgment. The jsProteinMapper widget provides a visualization of the relationship of a variant of interest relative to structural landmarks in a protein sequence. To try the widget, please visit https://pearcetm.github.io/jsproteinmapper. The widget fetches protein structure data, including the position of functional domains and active sites from the online Pfam database (http://pfam.xfam.org, last accessed September 30, 2019), and plots the position of a variant of interest relative to these structures to provide intuition about possible functional consequences of the variant in question (Figure 1). The location of the queried variant is visually indicated by a vertical red bar that spans across all the displayed tracks (Figure 1). In addition, it can display histogram-style graphs (variant tracks) of previously reported sequence variants, providing additional layers of clinical context. To interactively explore the data, a mouse or touchscreen can be used to zoom in and out, pan left and right through the protein structure, and show/hide additional details. By default, the widget displays a schematic of the entire protein, with the N-terminus on the left and the C-terminus on the right. However, this wide view can make it hard to see certain details around locations of interest. Similarly, variant tracks are initially scaled to show the full range of the data, which may not be the appropriate scale for potentially relevant low-frequency variants. The widget, therefore, provides several ways to zoom in and out and pan side to side to show the data most effectively. The scroll wheel (mouse, anywhere on the widget) or pinch gesture (touchscreen, on the protein structure) controls the zoom on the horizontal axis to get a closer look. Clicking (or touching) and dragging pans the whole structure left or right. A double click on the protein structure will zoom in on the horizontal axis. A double click on a variant track will zoom in on the vertical axis of that track, to show lower-frequency events more clearly. The zoom level automatically resets if zoomed in too far. In the desktop environment, the mutation tracks can additionally be zoomed in and out using the mouse wheel while holding the shift key. On a touch-enabled device, the mutation tracks can be zoomed using pinch gestures on the track of interest. Shift, click, and drag allows zooming directly to an area of interest. The selected area to be zoomed in on is indicated by a rectangle during dragging, and the zoom occurs on releasing the mouse button. Control panels contain clickable buttons for changing the zoom level of individual variant tracks (y axis) and the protein schematic (x axis), as well as for resetting the axes to the original view. Histograms of variant frequency can be plotted on a linear or logarithmic scale, selected on a per-track basis. A settings window allows the control panels to be shown or hidden on the basis of user preference. To prevent the graphic from becoming overly cluttered, a pop-up tooltip is used to present detailed information associated with various visual elements. For example, additional text describing the functional regions and active sites of the protein is initially hidden but can be revealed by user interactions. Similarly, a variant track displays the frequency of reported variants by amino acid position in a histogram-style graph; details of the breakdown of variants at each particular position are displayed in the tooltip. On a touch-enabled device, the tooltip is activated by a tap gesture. In a desktop environment, the tooltip can be activated by either moving the mouse over the element (hover mode) or clicking the element (click mode). Users can select their preferred mode of tooltip activation using the settings window described above. jsProteinMapper is designed to be integrated into clinical variant reporting software, rather than as a stand-alone web application. Therefore, the protein structure data, position, type of the variant of interest, and data for the variant tracks can be set programmatically using the widget's API. To embed the widget, simply include the appropriate JavaScript and Cascading Style Sheet files into your web application, instantiate the widget, and attach it to an appropriate document object model element. The appearance of the widget can be extensively customized using a combination of the JavaScript API and CSS styles. The widget also provides several helper functions consisting of default implementations of common tasks relating to processing the protein and variant data. Details regarding the API functions and further information about embedding the widget into a web application and user interface functionality can be found at https://github.com/pearcetm/jsproteinmapper (last accessed September 30, 2019). jsComut is an interactive comutation plot visualization for exploring cohort-level data (Figure 2). To try the widget, please visit https://pearcetm.github.io/jscomut. This type of visualization fundamentally represents data using a grid layout, with the horizontal position (column) of each element defined by the sample identifier and the vertical position (row) by the variable being measured. Each column represents a single subject, whereas each row represents the test results of a single gene or the value of a particular demographic trait. The application can also be used to generate SVG files, which can be manipulated using various SVG-editing applications, such as Adobe Illustrator and Inkscape (free and open sourced; https://inkscape.org/about, last accessed April 2, 2019), and embedded in presentations or publications. The widget accepts two types of data: genomic variants and demographic/clinical information. Data can be added to the visualization in two ways: text files can be loaded from the user's local file system, or data can be provided programmatically. Of importance, even if data are loaded from a file, the widget does not depend on sending those data over the Internet to a web server—all processing is done on the client side. This means that the user does not give up control of the data and, thus, eliminates security concerns about transmitting patient identifiers or proprietary data over the Internet and/or with third-party servers. Please refer to Discussion for further details of data security and privacy. Details about the format of data to be loaded can be found at https://github.com/pearcetm/jscomut (last accessed September 30, 2019). Once data are loaded into the widget, a host of configuration options is available that allow the user to customize the appearance of the graphic. Layout options include the size and spacing of the visual elements and whether to display certain elements, such as the sample identifiers. Color, which is used to visually encode the types of observed genomic variants as well as demographic categories, is fully customizable via the options interface. Individual rows, including genes or demographic categories, can be shown or hidden via the options interface as well. The power of jsComut lies in the interactivity it provides—the ability of the user to reorder the rows and columns of the data to explore relationships between multiple genes and demographic categories of data. The initial arrangement of genes and samples in the comutation plot grid is based on two-dimensional matrix sorting. First, the gene list, from the input data, is ranked in a descending order of the number of events. This is followed by recursive sorting of samples based on the presence or absence of alterations in each gene, in order, from the ranked gene list. To explore the data further and customize the layout, any data point on the grid, gene names, or sample names can be clicked, dragged, and dropped, which can reorder one or both of the axes. In addition, the entire grid can be sorted by the value of an individual row, by clicking on the row label. Finally, the widget itself provides a mechanism to automatically sort the data by the frequency of alteration of genes, by presence/absence of alterations on a per-gene basis, or both. Using these types of interactions, users can discover and display meaningful patterns in the data. Finally, once the widget has been configured and organized as desired, the configuration options and/or the data can be saved to the local file system as a JSON-encoded text file. Using this mechanism, the state of the visualization can be restored or shared, without requiring storage of sensitive and/or proprietary data on a third-party web server. jsComut can be used in multiple ways, including as a stand-alone web application with user-defined data and configuration or as an embedded widget with programmatic control over the data and configuration options. The widget provides a JavaScript API for loading data, defining the configuration, and activating the automated sorting functions. For more information, see https://github.com/pearcetm/jscomut (last accessed September 30, 2019). jsCodonWheel is an interactive version of the popular Codon Wheel graphic, which translates nucleotide changes at the codon level into amino acid changes at a translational level (Figure 3). To try the widget, please visit https://pearcetm.github.io/jscodonwheel. A codon can be entered by typing in a three-base code, typing in the one- or three-letter amino acid code, or clicking on the codon wheel itself. Once a codon has been entered, the encoded amino acid name, abbreviations, structure, and select biophysical properties are displayed. Two codons/amino acids can be selected at once: a reference/wild type and a variant. The path through the rings of the codon wheel is visually encoded using the background color of the relevant sections of the wheel, and comparative information about the amino acids is displayed side by side to assist in the interpretation of the amino acid change. The codon wheel can be used as a stand-alone web application or embedded into a larger web application. The layout and appearance can be customized by overriding the provided CSS, and the biophysical data to display about amino acids can be customized by providing an initialization option with the desired information to display. For additional details, please see the documentation and source code located at https://github.com/pearcetm/jscodonwheel (last accessed September 30, 2019). Performance tests were performed to understand the impact of clinically relevant data sets on the responsiveness of the widgets and the resulting user experience. The stress tests were performed on two workstations: a MacBook Pro (mid-2012 model; Apple Inc., Cupertino, CA) with four CPU cores and 8 GB RAM running macOS version 10.14.4 and a Hewlett Packard desktop PC (Hewlett Packard, Palo Alto, CA) with four CPU cores and 8 GB RAM running Windows 10. For jsProteinmapper, the performance was evaluated using 29,576, 26,746, and 9421 variants in BRAF, TP53, and PIK3CA genes, respectively, downloaded from the COSMIC version 84 database. The average time to load the data and render the initial graphics was <2 seconds per gene, and no noticeable response latency was identified during user interactions, leading to minimal (if any) negative impact on user experience. For jsComut performance testing, mutation data from two different data sets were used. The first, a brain tumor sequencing study from our institution,4Nikiforova M.N. Wald A.I. Melan M.A. Roy S. Zhong S. Hamilton R.L. Lieberman F.S. Drappatz J. Amankulor N.M. Pollack I.F. Nikiforov Y.E. Horbinski C. Targeted next-generation sequencing panel (GlioSeq) provides comprehensive genetic profiling of central nervous system tumors.Neuro Oncol. 2016; 18: 379-387Crossref PubMed Scopus (77) Google Scholar included 50 patient samples each tested for 23 genes and two associated demographic fields, with a total of 1150 genetic events (including negative findings). The time to load the data and render the visualization was <3 seconds, and rerendering during interactive exploration of these data set had no negative impact on user experience. To further test the application, the most recent The Cancer Genome Atlas cohort of urothelial carcinoma samples5Robertson A.G. Kim J. Al-Ahmadie H. Bellmunt J. Guo G. Cherniack A.D. et al.Comprehensive molecular characterization of muscle-invasive bladder cancer.Cell. 2018; 174: 1033Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar (413 samples in the 23 significantly mutated genes and 9039 events, including negative findings) was used. This much larger data set took approximately 35 seconds for the widget to load and parse the data file and render the visualization. Although the mouse hover events (to display additional variant information) remained responsive, pan/zoom and drag-and-drop interactions were impacted by a 1- to 2-second delay in updating the visual display. Digging further into the source of these performance limitations revealed that the bulk of the initial load time was composed of processing the input data by the browser to convert text into JavaScript data structures, suggesting that optimizing the data format and text processing steps may be useful in future versions of the widget. The slightly increased visual response time during user interaction with the widget is due to an inherent limitation of using SVG elements for rendering the visualization. With an increasing number of genetic alteration events, the number of SVG elements rendered in the browser increases significantly, which, depending on the available system memory and CPU cores, may negatively impact response time during user interactions. The HTML5 canvas element offers an alternative means of rendering data that avoids the performance costs incurred by large numbers of SVG elements; a canvas-based implementation may be incorporated into future versions of the jsComut widget, with the goal of improving the user experience during interactive data exploration. Data visualization is increasingly used in genomic medicine to highlight actionable findings in large data sets. Although the visual tools for data exploration and interpretation are not yet widely used for genomics and clinical data sets, there is increasing awareness of the value of these techniques in the genomics medicine community because of the challenges of interpreting high-dimensional data sets and the expanding use of data visualizations in other fields. The JavaScript widgets described in this article were developed to provide useful features set for a wide range of users and integration options for developers. When deciding how to implement these applications, the power of modern Internet browsers was leveraged for developing data visualization tools, given some of the notable advantages offered by the browser. First, web browsers are ubiquitous and accessible by anyone with a desktop, laptop, smartphone, or any other mobile device, unlike native (platform-specific) applications. Second, modern web browsers provide powerful data processing and user interface functionalities necessary for rendering the elements of a visualization. Third, the web-based open-source ecosystem is broad and powerful, both for data visualization frameworks and access to publicly accessible genomics databases (eg, cBioPortal,2Gao J. Aksoy B.A. Dogrusoz U. Dresdner G. Gross B. Sumer S.O. Sun Y. Jacobsen A. Sinha R. Larsson E. Cerami E. Sander C. Schultz N. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.Sci Signal. 2013; 6: pl1Crossref PubMed Scopus (7394) Google Scholar PECAN,6Zhou X. Edmonson M.N. Wilkinson M.R. Patel A. Wu G. Liu Y. Li Y. Zhang Z. Rusch M.C. Parker M. Becksfort J. Downing J.R. Zhang J. Exploring genomic alteration in pediatric cancer using ProteinPaint.Nat Genet. 2016; 48: 4-6Crossref PubMed Scopus (153) Google Scholar ClinVar,7Landrum M.J. Lee J.M. Riley G.R. Jang W. Rubinstein W.S. Church D.M. Maglott D.R. ClinVar: public archive of relationships among sequence variation and human phenotype.Nucleic Acids Res. 2014; 42: D980-D985Crossref PubMed Scopus (1460) Google Scholar CiVic,8Griffith M. Spies N.C. Krysiak K. McMichael J.F. Coffman A.C. Danos A.M. et al.CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer.Nat Genet. 2017; 49: 170-174Crossref PubMed Scopus (255) Google Scholar University of California, Santa Cruz, genome browser,9Casper J. Zweig A.S. Villarreal C. Tyner C. Speir M.L. Rosenbloom K.R. Raney B.J. Lee C.M. Lee B.T. Karolchik D. Hinrichs A.S. Haeussler M. Guruvadoo L. Navarro Gonzalez J. Gibson D. Fiddes I.T. Eisenhart C. Diekhans M. Clawson H. B
Referência(s)