Integrating 3D structural information into systems biology
2021; Elsevier BV; Volume: 296; Linguagem: Inglês
10.1016/j.jbc.2021.100562
ISSN1083-351X
AutoresDiana Murray, Donald Petrey, Barry Honig,
Tópico(s)Computational Drug Discovery Methods
ResumoSystems biology is a data-heavy field that focuses on systems-wide depictions of biological phenomena necessarily sacrificing a detailed characterization of individual components. As an example, genome-wide protein interaction networks are widely used in systems biology and continuously extended and refined as new sources of evidence become available. Despite the vast amount of information about individual protein structures and protein complexes that has accumulated in the past 50 years in the Protein Data Bank, the data, computational tools, and language of structural biology are not an integral part of systems biology. However, increasing effort has been devoted to this integration, and the related literature is reviewed here. Relationships between proteins that are detected via structural similarity offer a rich source of information not available from sequence similarity, and homology modeling can be used to leverage Protein Data Bank structures to produce 3D models for a significant fraction of many proteomes. A number of structure-informed genomic and cross-species (i.e., virus–host) interactomes will be described, and the unique information they provide will be illustrated with a number of examples. Tissue- and tumor-specific interactomes have also been developed through computational strategies that exploit patient information and through genetic interactions available from increasingly sensitive screens. Strategies to integrate structural information with these alternate data sources will be described. Finally, efforts to link protein structure space with chemical compound space offer novel sources of information in drug design, off-target identification, and the identification of targets for compounds found to be effective in phenotypic screens. Systems biology is a data-heavy field that focuses on systems-wide depictions of biological phenomena necessarily sacrificing a detailed characterization of individual components. As an example, genome-wide protein interaction networks are widely used in systems biology and continuously extended and refined as new sources of evidence become available. Despite the vast amount of information about individual protein structures and protein complexes that has accumulated in the past 50 years in the Protein Data Bank, the data, computational tools, and language of structural biology are not an integral part of systems biology. However, increasing effort has been devoted to this integration, and the related literature is reviewed here. Relationships between proteins that are detected via structural similarity offer a rich source of information not available from sequence similarity, and homology modeling can be used to leverage Protein Data Bank structures to produce 3D models for a significant fraction of many proteomes. A number of structure-informed genomic and cross-species (i.e., virus–host) interactomes will be described, and the unique information they provide will be illustrated with a number of examples. Tissue- and tumor-specific interactomes have also been developed through computational strategies that exploit patient information and through genetic interactions available from increasingly sensitive screens. Strategies to integrate structural information with these alternate data sources will be described. Finally, efforts to link protein structure space with chemical compound space offer novel sources of information in drug design, off-target identification, and the identification of targets for compounds found to be effective in phenotypic screens. The growth of protein structure information has stimulated a parallel growth in computational tools that predict protein structure and function. These tools provide fundamental insights into the physical principles that underlie the behavior of biological macromolecules. For example, molecular dynamics simulations allow realistic descriptions of conformational heterogeneity; Poisson–Boltzmann calculations have revealed how electrostatic interactions play a central role in biological functions; and the forces that determine the stability of the native folded state are now well understood. Advances such as these have been transformative and are part of the language and intellectual foundation of modern structural biology. A parallel set of computational methods falls under the rubric of "structural genomics," which includes the goal of structurally characterizing enough members of sequence families so as to enable the construction of homology models for the others. A key development has been the computational identification of geometric relationships among protein structures. Since structural similarity can identify functional relationships even in the absence of statistically significant sequence similarity, structural alignment has become a powerful tool to detect evolutionary relationships between proteins that cannot be detected from sequence alone. We have used the term Structural Blast (1Dey F. Cliff Zhang Q. Petrey D. Honig B. Toward a "structural BLAST": Using structural relationships to infer function.Protein Sci. 2013; 22: 359-366Crossref PubMed Scopus (11) Google Scholar) to imply the use of structural alignment to identify relationships between proteins in analogy to the widely used BLAST suite of programs for sequence alignment (2Altschul S.F. Gish W. Miller W. Myers E.W. Lipman D.J. Basic local alignment search tool.J. Mol. Biol. 1990; 215: 403-410Crossref PubMed Scopus (63801) Google Scholar). Figure 1 provides two examples of functional relationships that can be detected this way: protein–protein interaction (PPI) and protein–compound interaction. Figure 1A illustrates the structural alignment of four protein domains where BLAST fails to detect any sequence relationship between them. Figure 1B shows the experimentally determined complex between the pleckstrin homology (PH) domain from phospholipase C-gamma-2 (yellow) and the small GTPase Rac2 (gray). Structural alignment of the Ezrin F3 lobe (red) with the PH domain produces a model for the complex between Ezrin and Rac2 (red–gray). Similarly, Figure 1C shows the experimentally determined complex between the PH domain from mouse Beta-II spectrin (green) and inositol 1,4,5-trisphosphate (sticks). Structural alignment of the Tiam-2 PH domain (blue) with the Beta-II spectrin PH domain produces a model for the complex between Tiam-2 and inositol 1,4,5-trisphosphate (blue and sticks). These examples provide the basis of many of the methods highlighted later that, as will be described, enable the use of structural information on a genomic scale. The Protein Data Bank (PDB) (3Berman H.M. Westbrook J. Feng Z. Gilliland G. Bhat T.N. Weissig H. Shindyalov I.N. Bourne P.E. The protein Data Bank.Nucleic Acids Res. 2000; 28: 235-242Crossref PubMed Google Scholar) stands as a centerpiece of structural biology. It has created standards that impact the entire community, organized data in easily accessible form, and provided a battery of tools and links to other databases that have revealed multiple ways in which 3D structural information can be exploited for the detailed annotation of protein function and interactions. Indeed, much of the research that is discussed here would not have been possible without extensive use of the PDB and its many auxiliary resources. There are areas of biomedical research where protein structure is still underutilized. Specifically, cellular systems biology, with its heavy emphasis on the study of pathways and networks, has made only limited use of 3D information. In networks, PPIs are typically described as nodes (proteins) connected by edges (interactions), without reference to the structures of the proteins involved or the nature of the interactions. With 20,000 human protein coding genes and potentially millions of PPIs, it is not possible to obtain experimental structures for every node and edge in the interactome. Computational methods to interrogate these interactions can complement the available experimental evidence, enabling more meaningful insights from systems biology approaches. This article summarizes some of the advances in structural systems biology and points to strategies through which structural information can be integrated with the vast quantities of data emerging from high-throughput (HT) genomic technologies and patient records (summarized in Table 1). There are a number of computational methodologies that are central to this integration. First, the ability to construct homology models for most proteins in a given genome implies that, in principle, structure can be used on a genome-wide scale. Homology models dramatically enhance structural genomics efforts; for example, while there are structures available for about 5000 human proteins in the PDB, there are homology models for at least one domain of about 18,000 human proteins in databases such as ModBase (4Pieper U. Webb B.M. Dong G.Q. Schneidman-Duhovny D. Fan H. Kim S.J. Khuri N. Spill Y.G. Weinkam P. Hammel M. Tainer J.A. Nilges M. Sali A. ModBase, a database of annotated comparative protein structure models and associated resources.Nucleic Acids Res. 2014; 42: D336-346Crossref PubMed Scopus (175) Google Scholar) and SwissModel (5Waterhouse A. Bertoni M. Bienert S. Studer G. Tauriello G. Gumienny R. Heer F.T. de Beer T.A.P. Rempfer C. Bordoli L. Lepore R. Schwede T. SWISS-MODEL: Homology modelling of protein structures and complexes.Nucleic Acids Res. 2018; 46: W296-W303Crossref PubMed Scopus (2667) Google Scholar).Table 1Intersections between structural biology and systems biologySystems levelInsight from computational structural biologyProteinModels of protein domains (4Pieper U. Webb B.M. Dong G.Q. Schneidman-Duhovny D. Fan H. Kim S.J. Khuri N. Spill Y.G. Weinkam P. Hammel M. Tainer J.A. Nilges M. Sali A. ModBase, a database of annotated comparative protein structure models and associated resources.Nucleic Acids Res. 2014; 42: D336-346Crossref PubMed Scopus (175) Google Scholar, 5Waterhouse A. Bertoni M. Bienert S. Studer G. Tauriello G. Gumienny R. Heer F.T. de Beer T.A.P. Rempfer C. Bordoli L. Lepore R. Schwede T. SWISS-MODEL: Homology modelling of protein structures and complexes.Nucleic Acids Res. 2018; 46: W296-W303Crossref PubMed Scopus (2667) Google Scholar)Delineation of intrinsically disordered regions (97Oldfield C.J. Dunker A.K. Intrinsically disordered proteins and intrinsically disordered protein regions.Annu. Rev. Biochem. 2014; 83: 553-584Crossref PubMed Scopus (470) Google Scholar)Prediction of interaction surfaces (38Hwang H. Petrey D. Honig B. A hybrid method for protein-protein interface prediction.Protein Sci. 2016; 25: 159-165Crossref PubMed Scopus (0) Google Scholar, 94Hwang H. Dey F. Petrey D. Honig B. Structure-based prediction of ligand-protein interactions on a genome-wide scale.Proc. Natl. Acad. Sci. U. S. A. 2017; 114: 13685-13690Crossref PubMed Scopus (17) Google Scholar)Context of missense mutations (60Bailey M.H. Tokheim C. Porta-Pardo E. Sengupta S. Bertrand D. et al.Comprehensive characterization of cancer driver genes and mutations.Cell. 2018; 173: 371-385.e318Abstract Full Text Full Text PDF PubMed Scopus (575) Google Scholar, 98Porta-Pardo E. Valencia A. Godzik A. Understanding oncogenicity of cancer driver genes and mutations in the cancer genomics era.FEBS Lett. 2020; 594: 4233-4246Crossref PubMed Scopus (2) Google Scholar)PPIs (33Mosca R. Ceol A. Aloy P. Interactome3D: Adding structural details to protein networks.Nat. Methods. 2013; 10: 47-53Crossref PubMed Scopus (256) Google Scholar, 34Meyer M.J. Beltran J.F. Liang S. Fragoza R. Rumack A. Liang J. Wei X. Yu H. Interactome INSIDER: A structural interactome browser for genomic studies.Nat. Methods. 2018; 15: 107-114Crossref PubMed Scopus (46) Google Scholar, 35Garzon J.I. Deng L. Murray D. Shapira S. Petrey D. Honig B. A computational interactome and functional annotation for the human proteome.Elife. 2016; 5: e18715Crossref PubMed Scopus (0) Google Scholar)Determination of direct versus indirectDomain-level models of protein regions involvedAtomic-level detail of interfacesPathways/networksMolecular mechanisms for information flowMolecular depiction of complexes and series of PPIsPathway/submodule crosstalkHypothesis generation for effects of perturbationsRational targeting to alter phenotypic outcome (75Xie L. Ge X. Tan H. Xie L. Zhang Y. Hart T. Yang X. Bourne P.E. Towards structural systems pharmacology to study complex diseases and personalized medicine.Plos Comput. Biol. 2014; 10: e1003554Crossref PubMed Scopus (45) Google Scholar)Integration with subcellular localization (99Lundberg E. Borner G.H.H. Spatial proteomics: A powerful discovery tool for cell biology.Nat. Rev. Mol. Cell Biol. 2019; 20: 285-302Crossref PubMed Scopus (79) Google Scholar)Tissue/tumorIntegration with context-specific data (27Broyde J. Simpson D.R. Murray D. Paull E.O. Chu B.W. Tagore S. Jones S.J. Griffin A.T. Giorgi F.M. Lachmann A. Jackson P. Sweet-Cordero E.A. Honig B. Califano A. Oncoprotein-specific molecular interaction maps (SigMaps) for cancer network analyses.Nat. Biotechnol. 2021; 39: 215-224Crossref PubMed Scopus (0) Google Scholar)Differential pathways/networks (100Ideker T. Krogan N.J. Differential network biology.Mol. Syst. Biol. 2012; 8: 565Crossref PubMed Scopus (537) Google Scholar)Models for protein-mediated cell–cell interactions (101Honig B. Shapiro L. Adhesion protein structure, molecular affinities, and principles of cell-cell Recognition.Cell. 2020; 181: 520-535Abstract Full Text Full Text PDF PubMed Scopus (17) Google Scholar) Open table in a new tab A second methodology has been the use of Structural Blast, as illustrated in Figure 1. The structure-based identification of a large number of functional relationships combined with extensive structural coverage of multiple genomes with homology models enables the prediction of PPIs on a genomic scale. Third, machine learning (ML) is crucial to the integration of structural and genomic data. ML not only facilitates the combination of data from multiple sources but also mitigates inaccuracies in structural models since training will determine the extent to which the models have predictive value. In this regard, it is important to emphasize that inferences yielded in systems biology are often statistical in nature, and the use of structural information must be used in such a way so as to conform to this reality. This article is not meant as a comprehensive review of the literature, and many substantial studies do not appear on the reference list. Rather, our goal is to convey our own perspective of the development of a new interdisciplinary field and highlight articles that provide useful examples along with access to a larger literature. Our perspective is also embodied in our own contributions, some of which are summarized later. The discovery and analysis of PPI networks has become an important area of systems biology where a particular focus has been specific applications to human disease. In systems-based approaches, genes or proteins are identified as disease associated based on their topological location in interaction networks (6Menche J. Sharma A. Kitsak M. Ghiassian S.D. Vidal M. Loscalzo J. Barabasi A.L. Disease networks. Uncovering disease-disease relationships through the incomplete interactome.Science. 2015; 347: 1257601Crossref PubMed Scopus (609) Google Scholar, 7Huang J.K. Carlin D.E. Yu M.K. Zhang W. Kreisberg J.F. Tamayo P. Ideker T. Systematic evaluation of molecular networks for discovery of disease genes.Cell Syst. 2018; 6: 484-495Abstract Full Text Full Text PDF PubMed Scopus (70) Google Scholar, 8Carter H. Hofree M. Ideker T. Genotype to phenotype via network analysis.Curr. Opin. Genet. Dev. 2013; 23: 611-621Crossref PubMed Scopus (72) Google Scholar). A necessary step in the creation of a network is the identification of interactions among proteins, which may include formation of stable dimeric or multimeric complexes; transient engagements that in some cases may be of low affinity and in others may involve post-translational modification; nonphysical interactions where, for example, one protein may regulate the expression of another in the absence of any physical contact between the two. It is necessary to keep these distinctions in mind when reading the PPI literature. Given the centrality of PPIs in so many cellular processes, their experimental detection and computational prediction constitute a major research focus. Only HT experimental methods and highly efficient computational approaches are capable of detecting/predicting PPIs on a genomic scale. Complicating the challenge is the fact that physiological PPIs are context dependent: two proteins found to interact in an in vitro assay may well form a complex if expressed at appropriate levels but may never actually encounter one another in vivo. There are many genome-wide PPI databases for human and different model organisms (9Szklarczyk D. Jensen L.J. Protein-protein interaction databases.Methods Mol. Biol. 2015; 1278: 39-56Crossref PubMed Scopus (26) Google Scholar). Some are based on HT methods, such as yeast two-hybrid (10Rolland T. Tasan M. Charloteaux B. Pevzner S.J. Zhong Q. Sahni N. Yi S. Lemmens I. Fontanillo C. Mosca R. Kamburov A. Ghiassian S.D. Yang X. Ghamsari L. Balcha D. et al.A proteome-scale map of the human interactome network.Cell. 2014; 159: 1212-1226Abstract Full Text Full Text PDF PubMed Scopus (736) Google Scholar) and tandem affinity purification mass spectroscopy (11Huttlin E.L. Ting L. Bruckner R.J. Gebreab F. Gygi M.P. Szpyt J. Tam S. Zarraga G. Colby G. Baltier K. Dong R. Guarani V. Vaites L.P. Ordureau A. Rad R. et al.The BioPlex network: A systematic exploration of the human interactome.Cell. 2015; 162: 425-440Abstract Full Text Full Text PDF PubMed Scopus (714) Google Scholar), whereas others are based entirely on literature curation (e.g., BioGRID (12Oughtred R. Stark C. Breitkreutz B.J. Rust J. Boucher L. Chang C. Kolas N. O'Donnell L. Leung G. McAdam R. Zhang F. Dolma S. Willems A. Coulombe-Huntington J. Chatr-Aryamontri A. et al.The BioGRID interaction database: 2019 update.Nucleic Acids Res. 2019; 47: D529-D541Crossref PubMed Scopus (353) Google Scholar), IntAct (13Orchard S. Ammari M. Aranda B. Breuza L. Briganti L. Broackes-Carter F. Campbell N.H. Chavali G. Chen C. del-Toro N. Duesbury M. Dumousseau M. Galeota E. Hinz U. Iannuccelli M. et al.The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.Nucleic Acids Res. 2014; 42: D358-D363Crossref PubMed Scopus (860) Google Scholar), MINT (14Ceol A. Chatr Aryamontri A. Licata L. Peluso D. Briganti L. Perfetto L. Castagnoli L. Cesareni G. MINT, the molecular interaction database: 2009 update.Nucleic Acids Res. 2010; 38: D532-D539Crossref PubMed Scopus (408) Google Scholar)). Databases such as HINT (15Das J. Yu H. Hint: High-quality protein interactomes and their applications in understanding human disease.BMC Syst. Biol. 2012; 6: 92Crossref PubMed Scopus (216) Google Scholar), HURI (16Luck K. Kim D.K. Lambourne L. Spirohn K. Begg B.E. Bian W. Brignall R. Cafarelli T. Campos-Laborie F.J. Charloteaux B. Choi D. Cote A.G. Daley M. Deimling S. Desbuleux A. et al.A reference map of the human binary protein interactome.Nature. 2020; 580: 402-408Crossref PubMed Scopus (99) Google Scholar), and APID (17Alonso-Lopez D. Campos-Laborie F.J. Gutierrez M.A. Lambourne L. Calderwood M.A. Vidal M. De Las Rivas J. APID database: Redefining protein-protein interaction experimental evidences and binary interactomes.Database (Oxford). 2019; 2019https://doi.org/10.1093/database/baz005Crossref PubMed Scopus (22) Google Scholar) curate these resources to provide high-quality interactions and/or to extract only binary or physical associations. The widely used STRING database (18Franceschini A. Szklarczyk D. Frankild S. Kuhn M. Simonovic M. Roth A. Lin J. Minguez P. Bork P. von Mering C. Jensen L.J. STRING v9.1: Protein-protein interaction networks, with increased coverage and integration.Nucleic Acids Res. 2013; 41: D808-D815Crossref PubMed Scopus (2865) Google Scholar) combines literature curation with predictions based primarily on sequence relationships. With few exceptions, existing databases do not include context-specific information, such as the cell line, tissue, tumor type, disease condition, and others, in which the interactions are observed. Context-specific associations can be derived from methods based on the correlation of gene profiles across many conditions (e.g., cell lines or drug treatments) (19McDermott U. Large-scale compound screens and pharmacogenomic interactions in cancer.Curr. Opin. Genet. Dev. 2019; 54: 12-16Crossref PubMed Scopus (0) Google Scholar, 20Rouillard A.D. Gundersen G.W. Fernandez N.F. Wang Z. Monteiro C.D. McDermott M.G. Ma'ayan A. The harmonizome: A collection of processed datasets gathered to serve and mine knowledge about genes and proteins.Database (Oxford). 2016; 2016https://doi.org/10.1093/database/baw100Crossref PubMed Scopus (329) Google Scholar). These profiles are typically obtained from HT genomic screens of cancer cell lines or human tissue samples: Project Achilles for RNAi and CRISPR–Cas9 knockdowns (21Cowley G.S. Weir B.A. Vazquez F. Tamayo P. Scott J.A. Rusin S. East-Seletsky A. Ali L.D. Gerath W.F. Pantel S.E. Lizotte P.H. Jiang G. Hsiao J. Tsherniak A. Dwinell E. et al.Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies.Sci. Data. 2014; 1: 140035Crossref PubMed Google Scholar, 22Meyers R.M. Bryan J.G. McFarland J.M. Weir B.A. Sizemore A.E. Xu H. Dharia N.V. Montgomery P.G. Cowley G.S. Pantel S. Goodale A. Lee Y. Ali L.D. Jiang G. Lubonja R. et al.Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells.Nat. Genet. 2017; 49: 1779-1784Crossref PubMed Scopus (436) Google Scholar); the Library of Integrated Network-Based Cellular Signatures (LINCS) (23Stathias V. Turner J. Koleti A. Vidovic D. Cooper D. Fazel-Najafabadi M. Pilarczyk M. Terryn R. Chung C. Umeano A. Clarke D.J.B. Lachmann A. Evangelista J.E. Ma'ayan A. Medvedovic M. et al.LINCS data portal 2.0: Next generation access point for perturbation-response signatures.Nucleic Acids Res. 2020; 48: D431-D439Crossref PubMed Scopus (21) Google Scholar) and the Cancer Dependency Map (CMap) (24Tsherniak A. Vazquez F. Montgomery P.G. Weir B.A. Kryukov G. Cowley G.S. Gill S. Harrington W.F. Pantel S. Krill-Burger J.M. Meyers R.M. Ali L. Goodale A. Lee Y. Jiang G. et al.Defining a cancer dependency map.Cell. 2017; 170: 564-576.e516Abstract Full Text Full Text PDF PubMed Scopus (533) Google Scholar) for phenotypic drug screens; The Cancer Genome Atlas (TCGA) for tumor-specific genetic variation (25Hutter C. Zenklusen J.C. The cancer genome Atlas: Creating Lasting value beyond its data.Cell. 2018; 173: 283-285Abstract Full Text Full Text PDF PubMed Scopus (158) Google Scholar); and Genotype-Tissue Expression (GTEx) for nondiseased tissue-specific genetic variation (24Tsherniak A. Vazquez F. Montgomery P.G. Weir B.A. Kryukov G. Cowley G.S. Gill S. Harrington W.F. Pantel S. Krill-Burger J.M. Meyers R.M. Ali L. Goodale A. Lee Y. Jiang G. et al.Defining a cancer dependency map.Cell. 2017; 170: 564-576.e516Abstract Full Text Full Text PDF PubMed Scopus (533) Google Scholar). The Califano laboratory has pioneered the use of algorithms to predict tumor-specific regulatory interactions based on the analysis of large-scale molecular profile data taken, for example, from TCGA (26Califano A. Alvarez M.J. The recurrent architecture of tumour initiation, progression and drug sensitivity.Nat. Rev. Cancer. 2017; 17: 116-130Crossref PubMed Scopus (75) Google Scholar). As will be discussed later, the integration of patient-specific regulatory networks with predicted physical interactions between proteins enables the development of context-specific structure-informed protein interaction networks, thus providing mechanistic insights not available from resources mentioned previously (27Broyde J. Simpson D.R. Murray D. Paull E.O. Chu B.W. Tagore S. Jones S.J. Griffin A.T. Giorgi F.M. Lachmann A. Jackson P. Sweet-Cordero E.A. Honig B. Califano A. Oncoprotein-specific molecular interaction maps (SigMaps) for cancer network analyses.Nat. Biotechnol. 2021; 39: 215-224Crossref PubMed Scopus (0) Google Scholar). PPI prediction can involve (a) predicting the structure of known complexes given the structures of interacting monomers; (b) predicting whether and how two proteins interact given their structures, which requires building a model of the putative complex and then scoring it; (c) predicting whether two proteins interact given their sequence, which can be accomplished either by purely sequence-based methods, that is, sequence relationships to proteins in known complexes, or through some combination of methods (a) and (b). There are two main computational approaches for method (a): docking and template-based modeling. Docking methods (28Barradas-Bautista D. Rosell M. Pallara C. Fernandez-Recio J. Structural prediction of protein-protein interactions by docking: Application to biomedical problems.Adv. Protein Chem. Struct. Biol. 2018; 110: 203-249Crossref PubMed Scopus (5) Google Scholar, 29Vakser I.A. Protein-protein docking: From interaction to interactome.Biophys. J. 2014; 107: 1785-1793Abstract Full Text Full Text PDF PubMed Scopus (116) Google Scholar) are widely used but have not reached the point in terms of computation time where they can truly be used for genome-scale interactomes. Template modeling (30Petrey D. Chen T.S. Deng L. Garzon J.I. Hwang H. Lasso G. Lee H. Silkov A. Honig B. Template-based prediction of protein function.Curr. Opin. Struct. Biol. 2015; 32: 33-38Crossref PubMed Scopus (26) Google Scholar) involves superimposing the structures of two query proteins on structurally similar interacting proteins in a PDB complex (e.g., Fig. 1). Algorithms to find such structurally related proteins are currently quite efficient (31Zhang Y. Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score.Nucleic Acids Res. 2005; 33: 2302-2309Crossref PubMed Scopus (1527) Google Scholar, 32Yang A.S. Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance.J. Mol. Biol. 2000; 301: 665-678Crossref PubMed Scopus (168) Google Scholar). The Interactome3D server was an early resource for the prediction of the structures of protein complexes for different organisms (33Mosca R. Ceol A. Aloy P. Interactome3D: Adding structural details to protein networks.Nat. Methods. 2013; 10: 47-53Crossref PubMed Scopus (256) Google Scholar). The current release lists binary interactions taken from experimental databases and, where possible, structural models for 18 organisms. Structures of complexes are obtained from either the PDB or template-based modeling with templates identified based on sequence relationships. For the human proteome, structural models are provided for ~15,000 binary complexes involving ~10,000 proteins; about half of the complexes are taken from the PDB. Overall, Interactome3D lists 125,000 experimentally observed binary PPIs for the human proteome with structural models for 12%. Interactome INSIDER (34Meyer M.J. Beltran J.F. Liang S. Fragoza R. Rumack A. Liang J. Wei X. Yu H. Interactome INSIDER: A structural interactome browser for genomic studies.Nat. Methods. 2018; 15: 107-114Crossref PubMed Scopus (46) Google Scholar) also builds models for experimentally determined binary interactions. It is based in part on the Ensemble Classifier Learning Algorithm to predict Interface Residues (ECLAIR) framework, which combines features derived from individual proteins, such as surface properties, with pairwise PPI features obtained from docking and coevolution analysis. ECLAIR is trained on high-quality experimental data sets of PPIs (15Das J. Yu H. Hint: High-quality protein interactomes and their applications in understanding human disease.BMC Syst. Biol. 2012; 6: 92Crossref PubMed Scopus (216) Google Scholar). The current version contains over 120,000 predictions of structurally resolved interfaces for experimentally observed human PPIs. The high structural coverage of Interactome INSIDER is achieved by the use of docking, which avoids the necessity of a binary complex as a structural template; that is, only the structures of individual interacting proteins are needed. The Predicting Protein-Protein Interactions (PrePPI) algorithm is fundamentally different from Interactome3D and Interactome INSIDER in that it makes structure-informed predictions of whether two proteins interact independent of whether they appear in experimental databases (35Garzon J.I. Deng L. Murray D. Shapira S. Petrey D. Honig B. A computational interactome and functional annotation for the human proteome.Elife. 2016; 5: e18715Crossref PubMed Scopus (0) Google Scholar, 36Zhang Q.C. Petrey D. Deng L. Qiang L. Shi Y. Thu C.A. Bisikirska B. Lefebvre C. Accili D. Hunter T. Maniatis T. Califano A. Honig B. Structure-based prediction of protein-protein interactions on a genome-wide scale.Nature. 2012; 490: 556-560Crossref PubMed Scopus (422) Google Scholar). Furthermore, PrePPI uses structure on a truly genome-wide scale, effectively screening most of the ∼200 million possible human PPIs. Like other methods, it begins with a database of ∼18,000 PDB structures and homology models for proteins and their constituent domains. PrePPI then uses structural alignment to establish relationships among protein structures: every one of the ∼18,000 query proteins
Referência(s)