Computational Tools for the Interactive Exploration of Proteomic and Structural Data

Artigo Acesso aberto Revisado por pares

Computational Tools for the Interactive Exploration of Proteomic and Structural Data

2010; Elsevier BV; Volume: 9; Issue: 8 Linguagem: Inglês

10.1074/mcp.r000007-mcp201

ISSN

1535-9484

Autores

John H. Morris, Elaine C. Meng, Thomas E. Ferrin,

Tópico(s)

Genomics and Phylogenetic Studies

Resumo

Linking proteomics and structural data is critical to our understanding of cellular processes, and interactive exploration of these complementary data sets can be extremely valuable for developing or confirming hypotheses in silico. However, few computational tools facilitate linking these types of data interactively. In addition, the tools that do exist are neither well understood nor widely used by the proteomics or structural biology communities. We briefly describe several relevant tools, and then, using three scenarios, we present in depth two tools for the integrated exploration of proteomics and structural data. Linking proteomics and structural data is critical to our understanding of cellular processes, and interactive exploration of these complementary data sets can be extremely valuable for developing or confirming hypotheses in silico. However, few computational tools facilitate linking these types of data interactively. In addition, the tools that do exist are neither well understood nor widely used by the proteomics or structural biology communities. We briefly describe several relevant tools, and then, using three scenarios, we present in depth two tools for the integrated exploration of proteomics and structural data. A 3-D enhanced version of this article is available. The text is identical to this version but includes interactive figures. Viewing the enhanced version of this article requires the use of a browser plug-in. Please install the plug-in when prompted. http://www.thesgc.org/iSee/MCP/9/8/e1.html Structural biology and proteomics provide complementary views of cellular processes. Structural biology is primarily concerned with the structures of biological macromolecules and complexes and the physicochemical interactions they support. Proteomics, on the other hand, tends to take a broader view of how proteins communicate and function within the cell, often encompassing large numbers of proteins that operate in pathways or addressing how groups of proteins work together as a function of time and/or subcellular location. (The term “proteomics,” as used here, includes studies of not only the presence and abundance of proteins under various conditions but also their interactions and their functions, both individually and as parts of larger, more complex systems.) Understanding the molecular interactions between proteins at the atomic level is of obvious utility, yet it is equally critical to understand the broader context of how pathways function and change with differing levels of expression and copy number and how they are controlled by inhibition, activation, and feedback loops. Given the complementary nature of these approaches, it would seem natural for there to be in silico tools that support the interactive exploration of structural biology within the context of the proteome, and of the results of proteomics experiments from a structural perspective. However, although several studies that link proteomics to structure have been published (1.Kühner S. van Noort V. Betts M.J. Leo-Macias A. Batisse C. Rode M. Yamada T. Maier T. Bader S. Beltran-Alvarez P. Castaño-Diez D. Chen W.H. Devos D. Güell M. Norambuena T. Racke I. Rybin V. Schmidt A. Yus E. Aebersold R. Herrmann R. Böttcher B. Frangakis A.S. Russell R.B. Serrano L. Bork P. Gavin A.C. Proteome organization in a genome-reduced bacterium.Science. 2009; 326: 1235-1240Crossref PubMed Scopus (372) Google Scholar, 2.Zhang Y. Thiele I. Weekes D. Li Z. Jaroszewski L. Ginalski K. Deacon A.M. Wooley J. Lesley S.A. Wilson I.A. Palsson B. Osterman A. Godzik A. Three-dimensional structural view of the central metabolic network of Thermotoga maritima.Science. 2009; 325: 1544-1549Crossref PubMed Scopus (143) Google Scholar, 3.Kim P.M. Lu L.J. Xia Y. Gerstein M.B. Relating three-dimensional structures to protein networks provides evolutionary insights.Science. 2006; 314: 1938-1941Crossref PubMed Scopus (393) Google Scholar, 4.Huang Y.J. Hang D. Lu L.J. Tong L. Gerstein M.B. Montelione G.T. Targeting the human cancer pathway protein interaction network by structural genomics.Mol. Cell. Proteomics. 2008; 7: 2048-2060Abstract Full Text Full Text PDF PubMed Scopus (61) Google Scholar, 5.Han B.G. Dong M. Liu H. Camp L. Geller J. Singer M. Hazen T.C. Choi M. Witkowska H.E. Ball D.A. Typke D. Downing K.H. Shatsky M. Brenner S.E. Chandonia J.M. Biggin M.D. Glaeser R.M. Survey of large protein complexes in D. vulgaris reveals great structural diversity.Proc. Natl. Acad. Sci. U.S.A. 2009; 106: 16580-16585Crossref PubMed Scopus (24) Google Scholar), there are few existing tools for the interactive, integrated exploration of these complementary types of data. The structural biology and proteomics communities each have a set of commonly used interactive visualization programs, and it would be useful to investigate how these tools could work together and how they could be more tightly linked. Network visualization and analysis tools are commonly used to interact with proteomics data. This makes good sense: proteomics data are often associated with pathways or protein interactions, and both of these are easily visualized as networks. Even types of data not normally viewed as networks (e.g. microarray results) are often painted onto signaling, metabolic, or other pathways or protein interaction networks for visualization and analysis. A number of visualization and analysis tools are used for protein networks. The most commonly used tool is certainly Cytoscape (6.Shannon P. Markiel A. Ozier O. Baliga N.S. Wang J.T. Ramage D. Amin N. Schwikowski B. Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks.Genome Res. 2003; 13: 2498-2504Crossref PubMed Scopus (22844) Google Scholar, 7.Cline M.S. Smoot M. Cerami E. Kuchinsky A. Landys N. Workman C. Christmas R. Avila-Campilo I. Creech M. Gross B. Hanspers K. Isserlin R. Kelley R. Killcoyne S. Lotia S. Maere S. Morris J. Ono K. Pavlovic V. Pico A.R. Vailaya A. Wang P.L. Adler A. Conklin B.R. Hood L. Kuiper M. Sander C. Schmulevich I. Schwikowski B. Warner G.J. Ideker T. Bader G.D. Integration of biological networks and gene expression data using Cytoscape.Nat. Protoc. 2007; 2: 2366-2382Crossref PubMed Scopus (1766) Google Scholar), but others such as VisANT (8.Hu Z. Mellor J. Wu J. Yamada T. Holloway D. Delisi C. VisANT: data-integrating visual framework for biological networks and modules.Nucleic Acids Res. 2005; 33: W352-W357Crossref PubMed Scopus (150) Google Scholar), Osprey (9.Breitkreutz B.J. Stark C. Tyers M. Osprey: a network visualization system.Genome Biol. 2002; 3 (PREPRINT0012)Google Scholar), BioLayout Express3D (10.Freeman T.C. Goldovsky L. Brosch M. van Dongen S. Mazière P. Grocock R.J. Freilich S. Thornton J. Enright A.J. Construction, visualisation, and clustering of transcription networks from microarray expression data.PLoS Comput. Biol. 2007; 3: 2032-2042Crossref PubMed Scopus (196) Google Scholar), Arena3D (11.Pavlopoulos G.A. O'Donoghue S.I. Satagopam V.P. Soldatos T.G. Pafilis E. Schneider R. Arena3D: visualization of biological networks in 3D.BMC Syst. Biol. 2008; 2: 104Crossref PubMed Scopus (79) Google Scholar), and PATIKA (12.Demir E. Babur O. Dogrusoz U. Gursoy A. Nisanci G. Cetin-Atalay R. Ozturk M. PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways.Bioinformatics. 2002; 18: 996-1003Crossref PubMed Scopus (101) Google Scholar) are also cited. In the commercial space, Pathway Studio is commonly used (13.Nikitin A. Egorov S. Daraselia N. Mazo I. Pathway studio–the analysis and navigation of molecular networks.Bioinformatics. 2003; 19: 2155-2157Crossref PubMed Scopus (503) Google Scholar). For a useful review of the various biological network analysis tools, see Pavlopoulos et al. (14.Pavlopoulos G.A. Wegener A.L. Schneider R. A survey of visualization tools for biological network analysis.BioData Min. 2008; 1: 12Crossref PubMed Google Scholar). Structural visualization and analysis have a very long and rich history, and a discussion of the various molecular visualization and analysis packages is beyond the scope of this article. The most common stand-alone molecular visualization packages are PyMOL (15.DeLano W.L. The PyMOL Molecular Graphics System. DeLano Scientific LLC, San Carlos, CA2002Google Scholar), VMD (16.Humphrey W. Dalke A. Schulten K. VMD: visual molecular dynamics.J. Mol. Graph. 1996; 14 (27–28): 33-38Crossref PubMed Scopus (35757) Google Scholar), and UCSF Chimera (17.Pettersen E.F. Goddard T.D. Huang C.C. Couch G.S. Greenblatt D.M. Meng E.C. Ferrin T.E. UCSF Chimera—a visualization system for exploratory research and analysis.J. Comput. Chem. 2004; 25: 1605-1612Crossref PubMed Scopus (26024) Google Scholar). Jmol (18.Murray-Rust P. Rzepa H.S. Williamson M.J. Willighagen E.L. Chemical markup, XML, and the World Wide Web. 5. Applications of chemical metadata in RSS aggregators.J. Chem. Inf. Comput. Sci. 2004; 44: 462-469Crossref PubMed Scopus (44) Google Scholar) and Rasmol (19.Sayle R.A. Milner-White E.J. RASMOL: biomolecular graphics for all.Trends Biochem. Sci. 1995; 20: 374Abstract Full Text PDF PubMed Scopus (2298) Google Scholar) are often used as web add-ons for structural visualization, and the Research Collaboratory for Structural Bioinformatics and the National Center for Biotechnology Information both have their own viewers (Protein Workshop (20.Moreland J.L. Gramada A. Buzko O.V. Zhang Q. Bourne P.E. The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications.BMC Bioinformatics. 2005; 6: 21Crossref PubMed Scopus (237) Google Scholar) and Cn3D (21.Wang Y. Geer L.Y. Chappey C. Kans J.A. Bryant S.H. Cn3D: sequence and structure views for Entrez.Trends Biochem. Sci. 2000; 25: 300-302Abstract Full Text Full Text PDF PubMed Scopus (225) Google Scholar), respectively). In the commercial space, Sybyl from Tripos and Discovery Studio from Accelrys are widely used. To date, there are very few tools that provide any kind of interactive linkage between proteomics data sets and the structures of proteins. STRING (22.Jensen L.J. Kuhn M. Stark M. Chaffron S. Creevey C. Muller J. Doerks T. Julien P. Roth A. Simonovic M. Bork P. von Mering C. STRING 8—a global view on proteins and their functional interactions in 630 organisms.Nucleic Acids Res. 2009; 37: D412-D416Crossref PubMed Scopus (1830) Google Scholar) is a web service that provides the user with a protein-protein interaction network. The user may click on a node to reveal more information about the protein, including a static image of the three-dimensional structure if known. Clicking the image takes the user to the European Molecular Biology Laboratory-European Bioinformatics Institute web entry for that structure, which allows interactive visualization with Jmol. A different approach is taken by structureViz (23.Morris J.H. Huang C.C. Babbitt P.C. Ferrin T.E. structureViz: linking Cytoscape and UCSF Chimera.Bioinformatics. 2007; 23: 2345-2347Crossref PubMed Scopus (61) Google Scholar), a plug-in to Cytoscape that loads the structures for network nodes designated by the user into UCSF Chimera for interactive three-dimensional visualization and analysis. Interaction is bidirectional so that selecting a structure in Chimera will select the appropriate node in Cytoscape. As discussed above, a variety of computational tools are available for research in proteomics and structural biology, and it is beyond the scope of this article to provide a detailed comparison between them. Some of these tools may be used together to “drill down” from the proteomics, network-oriented view to a structural view. One pair of tools that may be used together in this manner is Cytoscape (6.Shannon P. Markiel A. Ozier O. Baliga N.S. Wang J.T. Ramage D. Amin N. Schwikowski B. Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks.Genome Res. 2003; 13: 2498-2504Crossref PubMed Scopus (22844) Google Scholar, 7.Cline M.S. Smoot M. Cerami E. Kuchinsky A. Landys N. Workman C. Christmas R. Avila-Campilo I. Creech M. Gross B. Hanspers K. Isserlin R. Kelley R. Killcoyne S. Lotia S. Maere S. Morris J. Ono K. Pavlovic V. Pico A.R. Vailaya A. Wang P.L. Adler A. Conklin B.R. Hood L. Kuiper M. Sander C. Schmulevich I. Schwikowski B. Warner G.J. Ideker T. Bader G.D. Integration of biological networks and gene expression data using Cytoscape.Nat. Protoc. 2007; 2: 2366-2382Crossref PubMed Scopus (1766) Google Scholar), an open source package widely used to visualize and analyze networks, and Chimera (17.Pettersen E.F. Goddard T.D. Huang C.C. Couch G.S. Greenblatt D.M. Meng E.C. Ferrin T.E. UCSF Chimera—a visualization system for exploratory research and analysis.J. Comput. Chem. 2004; 25: 1605-1612Crossref PubMed Scopus (26024) Google Scholar), a well-supported and widely distributed academic molecular visualization and analysis package. We explore how Cytoscape and Chimera might be used together by presenting three example research scenarios. Our focus is on the computational tools rather than on the specific data; the scenarios are based on previously published studies, and the results are not meant to represent novel findings. It is also the case that both Chimera and Cytoscape are relatively sophisticated tools with many features that may require some effort to fully master. Our intent is not to illustrate all of the features available in these tools but rather to provide examples of how they can be applied to gain insight into scientific problems. Lastly, it is difficult to convey the interactive nature of these tools using static images. To give a basic idea of the dynamic nature of the systems under study, we have provided two animations as supplemental Movies 1 and 2. The first two scenarios deal with glioblastoma multiforme (GBM), 1The abbreviations used are:GBMglioblastoma multiformeTCGAThe Cancer Genome AtlasPDBProtein Data BankIDHisocitrate dehydrogenaseCDKN2Acyclin-dependent kinase inhibitor 2ACDK4cyclin-dependent kinase 4PDGFRAplatelet-derived growth factor receptor, α polypeptideERBB3v-erb-b2 erythroblastic leukemia viral oncogene homolog 3, also known as HER3TP53tumor protein 53PTENphosphatase and tensin homologEGFRepidermal growth factor receptorNF1neurofibromin 1FGFR1basic fibroblast growth factor receptor 1HIF-1αhypoxia-inducible factor 1, α subunit2HG2-hydroxyglutarateGOgene ontologyRPB1–12RNA polymerase subunits 1–12SEASimilarity Ensemble ApproachPEpurification enrichmentMCLMarkov cluster. the most common and aggressive brain tumor in humans (24.Holland E.C. Glioblastoma multiforme: the terminator.Proc. Natl. Acad. Sci. U.S.A. 2000; 97: 6242-6244Crossref PubMed Scopus (538) Google Scholar, 25.Furnari F.B. Fenton T. Bachoo R.M. Mukasa A. Stommel J.M. Stegh A. Hahn W.C. Ligon K.L. Louis D.N. Brennan C. Chin L. DePinho R.A. Cavenee W.K. Malignant astrocytic glioma: genetics, biology, and paths to treatment.Genes Dev. 2007; 21: 2683-2710Crossref PubMed Scopus (1772) Google Scholar). Glioblastoma multiforme was also the first of the cancer types to undergo comprehensive genomic characterization by The Cancer Genome Atlas (TCGA) project (26.Atlas T.C.G. Comprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature. 2008; 455: 1061-1068Crossref PubMed Scopus (5645) Google Scholar). The TCGA glioblastoma data set consists of three types of data: copy number variation of 17,789 genes for 206 glioblastoma cases, mRNA expression for the same 17,789 genes and 206 cases, and mutation data for 601 sequenced genes for 91 of the cases. In scenario 1, we use Cytoscape to explore a curated signaling pathway obtained from the TCGA data portal. From the GBM mutation data mapped onto the pathway, we choose one mutation of interest and drill down into Chimera to view the possible structural implications. glioblastoma multiforme The Cancer Genome Atlas Protein Data Bank isocitrate dehydrogenase cyclin-dependent kinase inhibitor 2A cyclin-dependent kinase 4 platelet-derived growth factor receptor, α polypeptide v-erb-b2 erythroblastic leukemia viral oncogene homolog 3, also known as HER3 tumor protein 53 phosphatase and tensin homolog epidermal growth factor receptor neurofibromin 1 basic fibroblast growth factor receptor 1 hypoxia-inducible factor 1, α subunit 2-hydroxyglutarate gene ontology RNA polymerase subunits 1–12 Similarity Ensemble Approach purification enrichment Markov cluster. Scenario 2 focuses on isocitrate dehydrogenase 1 (IDH1), a metabolic enzyme that has been found mutated in glioblastoma (27.Parsons D.W. Jones S. Zhang X. Lin J.C. Leary R.J. Angenendt P. Mankoo P. Carter H. Siu I.M. Gallia G.L. Olivi A. McLendon R. Rasheed B.A. Keir S. Nikolskaya T. Nikolsky Y. Busam D.A. Tekleab H. Diaz Jr., L.A. Hartigan J. Smith D.R. Strausberg R.L. Marie S.K. Shinjo S.M. Yan H. Riggins G.J. Bigner D.D. Karchin R. Papadopoulos N. Parmigiani G. Vogelstein B. Velculescu V.E. Kinzler K.W. An integrated genomic analysis of human glioblastoma multiforme.Science. 2008; 321: 1807-1812Crossref PubMed Scopus (4428) Google Scholar). Using networks, we explore the function of wild-type and mutant IDH1 to hypothesize how the mutation might relate to glioblastoma. Finally, in scenario 3, we look at a protein-protein interaction data set from the budding yeast Saccharomyces cerevisiae and examine how these data can support the modeling of large protein complexes (see the articles by Lasker et al. (75.Lasker K. Phillips J.L. Russel D. Velazquez-Muriel J. Schneidman-Duhovny D. Tjioe E. Webb B. Schlessinger A. Sali A. Integrative structure modeling of macromolecular assemblies from proteomics data.Mol. Cell. Proteomics. 2010; 9: 1689-1702Abstract Full Text Full Text PDF PubMed Scopus (61) Google Scholar) and Förster et al. (76.Förster F. Lasker K. Nickell S. Sali A. Baumeister W. Toward an integrated structural model of the 26 S proteasome.Mol. Cell. Proteomics. 2010; 9: 1666-1677Abstract Full Text Full Text PDF PubMed Scopus (50) Google Scholar) in this issue). This data set has been annotated with available structural data, and we show how this information might be used in conjunction with automated fitting within Chimera. There are a number of repositories of curated pathways that can be loaded into Cytoscape, including the Kyoto Encyclopedia of Genes and Genomes (28.Kanehisa M. Goto S. KEGG: Kyoto encyclopedia of genes and genomes.Nucleic Acids Res. 2000; 28: 27-30Crossref PubMed Scopus (16238) Google Scholar, 29.Aoki K.F. Kanehisa M. Using the KEGG database resource.Curr. Protoc. Bioinformatics. 2005; (Chapter 1, Unit 1.12)Crossref PubMed Google Scholar, 30.Aoki-Kinoshita K.F. Kanehisa M. Gene annotation and pathway mapping in KEGG.Methods Mol. Biol. 2007; 396: 71-91Crossref PubMed Google Scholar), Reactome (31.Matthews L. Gopinath G. Gillespie M. Caudy M. Croft D. de Bono B. Garapati P. Hemish J. Hermjakob H. Jassal B. Kanapin A. Lewis S. Mahajan S. May B. Schmidt E. Vastrik I. Wu G. Birney E. Stein L. D'Eustachio P. Reactome knowledgebase of human biological pathways and processes.Nucleic Acids Res. 2009; 37: D619-D622Crossref PubMed Scopus (660) Google Scholar), Pathway Commons, BioCyc (32.Karp P.D. Ouzounis C.A. Moore-Kochlacs C. Goldovsky L. Kaipa P. Ahrén D. Tsoka S. Darzentas N. Kunin V. López-Bigas N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes.Nucleic Acids Res. 2005; 33: 6083-6089Crossref PubMed Scopus (483) Google Scholar), the NCI-Nature Pathway Interaction Database (33.Schaefer C.F. Anthony K. Krupa S. Buchoff J. Day M. Hannay T. Buetow K.H. PID: the Pathway Interaction Database.Nucleic Acids Res. 2009; 37: D674-D679Crossref PubMed Scopus (1029) Google Scholar), and WikiPathways (34.Pico A.R. Kelder T. van Iersel M.P. Hanspers K. Conklin B.R. Evelo C. WikiPathways: pathway editing for the people.PLoS Biol. 2008; 6: e184Crossref PubMed Scopus (430) Google Scholar, 35.Kelder T. Pico A.R. Hanspers K. van Iersel M.P. Evelo C. Conklin B.R. Mining biological pathways using WikiPathways web services.PLoS One. 2009; 4: e6447Crossref PubMed Scopus (93) Google Scholar). In addition, there are many repositories of protein-protein interaction data sets such as the Human Protein Reference Database (36.Prasad T.S. Kandasamy K. Pandey A. Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology.Methods Mol. Biol. 2009; 577: 67-79Crossref PubMed Scopus (218) Google Scholar), Pathway Commons, and STRING (22.Jensen L.J. Kuhn M. Stark M. Chaffron S. Creevey C. Muller J. Doerks T. Julien P. Roth A. Simonovic M. Bork P. von Mering C. STRING 8—a global view on proteins and their functional interactions in 630 organisms.Nucleic Acids Res. 2009; 37: D412-D416Crossref PubMed Scopus (1830) Google Scholar) that may be used to augment existing pathways with additional interaction partners. This scenario uses a curated signaling pathway provided on the TCGA data portal. This pathway represents the most frequently altered genes in glioblastoma based on the TCGA phase I data (26.Atlas T.C.G. Comprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature. 2008; 455: 1061-1068Crossref PubMed Scopus (5645) Google Scholar). In addition to the curated pathway, the TCGA data portal provides downloads of the three data sets that can be used to annotate the pathway: expression, copy number variation, and mutations. Supplemental Fig. 1 shows a screenshot of Cytoscape with the TCGA-curated pathway for glioblastoma loaded and provides a description of the user interface for Cytoscape. A Cytoscape session file with the TCGA pathway is included as supplemental Data 1. First we explore the expression profile of each of the genes across all of the tumor patients. The differential regulation of gene expression has been associated with a large number of diseases (37.Zhang F. Gu W. Hurles M.E. Lupski J.R. Copy number variation in human health, disease, and evolution.Annu. Rev. Genomics Hum. Genet. 2009; 10: 451-481Crossref PubMed Scopus (781) Google Scholar, 38.Horan M.P. Application of serial analysis of gene expression to the study of human genetic disease.Hum. Genet. 2009; 126: 605-614Crossref PubMed Scopus (18) Google Scholar) and can implicate specific genes. The usual mechanism to view differential gene expression across multiple genes and conditions is to hierarchically cluster the data and view the results as a heat map with dendrograms representing the clusters for both genes and conditions (39.Eisen M.B. Spellman P.T. Brown P.O. Botstein D. Cluster analysis and display of genome-wide expression patterns.Proc. Natl. Acad. Sci. U.S.A. 1998; 95: 14863-14868Crossref PubMed Scopus (13079) Google Scholar), where each tumor represents a different condition in this example. After annotating the TCGA pathway with the mRNA expression results, we can use the Cytoscape clusterMaker plug-in to perform the clustering (Fig. 1). As described in the TCGA 2008 report, this clustering does not lead to any obvious conclusions; that is, none of the genes are overexpressed or underexpressed in all (or even most) of the tumors. On the other hand, looking at the clustering of tumors, we can see two broad categories: those overexpressing CDKN2A or CDK4 and those underexpressing PDGFRA/ERBB3 and CDKN2A. Although these groups are discernable, there remain certain inconsistencies within the groups that prevent a clear categorization of the tumors. To view differential mRNA expression in the context of the pathway, we can animate the coloring of the nodes in the pathway using the “Map colors to network” capability of the clusterMaker plug-in (supplemental Movie 1), and it can be seen that for each tumor some sets of genes are either over- or underexpressed, but there is no readily discernable pattern. The lack of expression patterns might lead to an exploration of copy number variation or mutations. Like the mRNA expression data, copy number variations can be analyzed by clustering (Fig. 2), although a more detailed analysis, including structural data where available, may be required for the individual mutations. For the mutation analysis, we used the same TCGA pathway, annotated it with the known structures for each gene product from the Protein Data Bank (PDB) (40.Bernstein F.C. Koetzle T.F. Williams G.J. Meyer Jr., E.F. Brice M.D. Rodgers J.R. Kennard O. Shimanouchi T. Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures.J. Mol. Biol. 1977; 112: 535-542Crossref PubMed Scopus (8136) Google Scholar, 41.Berman H.M. Westbrook J. Feng Z. Gilliland G. Bhat T.N. Weissig H. Shindyalov I.N. Bourne P.E. The Protein Data Bank.Nucleic Acids Res. 2000; 28: 235-242Crossref PubMed Scopus (26227) Google Scholar), and imported the mutation data from the TCGA data portal. We added to the imported data an additional column to represent the percentage of tumors that were mutated for each gene. Fig. 3 is an export from Cytoscape showing each gene colored by the percentage of sequenced tumors showing mutations for that gene. Among the most strongly colored genes are TP53, PTEN, EGFR, and NF1, which have been identified previously as mutated in many tumors. Interestingly, the most frequently mutated gene, TP53, is only mutated in 34% of the tumors, and the tyrosine phosphatase PTEN is only modified in 30.7% of the tumors. One of the less frequently mutated proteins, basic fibroblast growth factor receptor 1 (FGFR1), has been well studied, and partial structures of the protein are available in the PDB. Although this protein is less frequently mutated than some of the other genes, the nature of the mutation and availability of structures provide an interesting example for our structural analysis. FGFR1 is a receptor tyrosine kinase. Like other receptor tyrosine kinases, it comprises an extracellular ligand-binding part, a single transmembrane-spanning segment, and an intracellular part that includes a protein-tyrosine kinase domain. Ligands such as fibroblast growth factor that activate FGFR1 cause receptor dimerization and autophosphorylation across the dimer interface. Autophosphorylation shifts the kinase domain into an active state. The activated receptor goes on to bind and phosphorylate several downstream partners (42.Turner N. Grose R. Fibroblast growth factor signalling: from development to cancer.Nat. Rev. Cancer. 2010; 10: 116-129Crossref PubMed Scopus (1763) Google Scholar). FGFR1 signaling is involved in growth and proliferation, and overactivity has been associated with various cancers. Other glioblastoma-associated mutations in FGFR1 have been discussed previously (43.Rand V. Huang J. Stockwell T. Ferriera S. Buzko O. Levy S. Busam D. Li K. Edwards J.B. Eberhart C. Murphy K.M. Tsiamouri A. Beeson K. Simpson A.J. Venter J.C. Riggins G.J. Strausberg R.L. Sequence survey of receptor tyrosine kinases reveals mutations in glioblastomas.Proc. Natl. Acad. Sci. U.S.A. 2005; 102: 14344-14349Crossref PubMed Scopus (127) Google Scholar, 44.Lew E.D. Furdui C.M. Anderson K.S. Schlessinger J. The precise sequence of FGF receptor autophosphorylation is kinetically driven and is disrupted by oncogenic mutations.Sci. Signal. 2009; 2: ra6Crossref PubMed Scopus (105) Google Scholar), so here we only address the mutation K656E found in the TCGA study. The structureViz Cytoscape plug-in can be used to load structures into Chimera for analysis. The two structures of interest are the structure of the FGFR1 kinase domain in the inactive state (for example, PDB code 3c4f (45.Tsai J. Lee J.T. Wang W. Zhang J. Cho H. Mamo S. Bremer R. Gillette S. Kong J. Haass N.K. Sproesser K. Li L. Smalley K.S. Fong D. Zhu Y.L. Marimuthu A. Nguyen H. Lam B. Liu J. Cheung I. Rice J. Suzuki Y. Luu C. Settachatgul C. Shellooe R. Cantwell J. Kim S.H. Schlessinger J. Zhang K.Y. West B.L. Powell B. Habets G. Zhang C. Ibrahim P.N. Hirth P. Artis D.R. Herlyn M. Bollag G. Discovery of a selective inhibitor of oncogenic B-Raf kinase with potent antimelanoma activity.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 3041-3046Crossref PubMed Scopus (1069) Google Scholar)) and the phosphorylated, activated state (for example, PDB code 3gqi (46.Bae J.H. Lew E.D. Yuzawa S. Tomé F. Lax I. Schlessinger J. The selectivity of receptor tyrosine kinase signaling is controlled by a secondary SH2 domain binding site.Cell. 2009; 138: 514-524Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar)) (Fig. 4). Multiple tyrosine residues in FGFR1 are autophosphorylated, and the phosphorylations occur in a stereotyped order with successive stages incrementally increasing kinase activity (44.Lew E.D. Furdui C.M. Anderson K.S. Schlessinger J. The precise sequence of FGF receptor autophosphorylation is kinetically driven and is disrupted by oncogenic mutations.Sci. Signal. 2009; 2: ra6Crossref PubMed Scopus (105) Google Scholar). Remarkably, given their adjacency in sequence, Tyr-653 is the first to be phosphorylated, but Tyr-654 is phosphorylated last, after several other tyrosines (44.Lew E.D. Furdui C.M. Anderson K.S. Schlessinger J. The precise sequence of FGF receptor autophosphorylation is kinetically driven and is disrupted by oncogenic mutations.Sci. Signal. 2009; 2: ra6Crossref PubMed Scopus (105) Google Scholar). Tyr-653, Tyr-654, and the glioblastoma mutation site Lys-656 are all within the “activation loop,” which undergoes a large conformational change. The two structures of the FGFR1 kinase domain can be superimposed in Chimera to show the magnitude of the conformational change (Fig. 4). In the activated conformation, Lys-656 is hydrogen-bonded (H-bonded) to phospho-Tyr-654 as indicated by the red dashed line in Fig. 5. Mutation of this lysine to glutamate, which is negatively charged, could mimic or interfere with phos

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Computational Tools for the Interactive Exploration of Proteomic and Structural Data