Large-scale De Novo Prediction of Physical Protein-Protein Association

Artigo Acesso aberto Revisado por pares

Large-scale De Novo Prediction of Physical Protein-Protein Association

2011; Elsevier BV; Volume: 10; Issue: 11 Linguagem: Inglês

10.1074/mcp.m111.010629

ISSN

1535-9484

Autores

Antigoni Elefsinioti, Ömer Saraç, Anna Hegele, Conrad Plake, Nina C. Hubner, Ina Poser, Mihail Sarov, Anthony A. Hyman, Matthias Mann, Michael Schroeder, Ulrich Stelzl, Andreas Beyer,

Tópico(s)

Computational Drug Discovery Methods

Resumo

Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future.Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org. Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org. Accurate high-throughput detection of protein-protein interactions is one of the most challenging tasks in the postgenomic era. Availability of such data has become essential for studying biological pathways, molecular evolution, for assessing protein functions based on functional genetics screens, and for studying molecular mechanisms of diseases (1Beyer A. Bandyopadhyay S. Ideker T. Integrating physical and genetic maps: from genomes to interaction networks.Nat. Rev. Genet. 2007; 8: 699-710Crossref PubMed Scopus (160) Google Scholar, 2Stumpf M.P. Thorne T. de Silva E. Stewart R. An H.J. Lappe M. Wiuf C. Estimating the size of the human interactome.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 6959-6964Crossref PubMed Scopus (590) Google Scholar, 3Gunsalus K.C. Ge H. Schetter A.J. Goldberg D.S. Han J.D. Hao T. Berriz G.F. Bertin N. Huang J. Chuang L.S. Li N. Mani R. Hyman A.A. Sönnichsen B. Echeverri C.J. Roth F.P. Vidal M. Piano F. Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis.Nature. 2005; 436: 861-865Crossref PubMed Scopus (227) Google Scholar). The size of the human physical interactome is predicted to be between 130,000–600,000 interactions (2Stumpf M.P. Thorne T. de Silva E. Stewart R. An H.J. Lappe M. Wiuf C. Estimating the size of the human interactome.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 6959-6964Crossref PubMed Scopus (590) Google Scholar, 4Bork P. Jensen L.J. von Mering C. Ramani A.K. Lee I. Marcotte E.M. Protein interaction networks from yeast to human.Curr. Opin. Struct. Biol. 2004; 14: 292-299Crossref PubMed Scopus (283) Google Scholar, 5Venkatesan K. Rual J.F. Vazquez A. Stelzl U. Lemmens I. Hirozane-Kishikawa T. Hao T. Zenkner M. Xin X. Goh K.I. Yildirim M.A. Simonis N. Heinzmann K. Gebreab F. Sahalie J.M. Cevik S. Simon C. de Smet A.S. Dann E. Smolyar A. Vinayagam A. Yu H. Szeto D. Borick H. Dricot A. Klitgord N. Murray R.R. Lin C. Lalowski M. Timm J. Rau K. Boone C. Braun P. Cusick M.E. Roth F.P. Hill D.E. Tavernier J. Wanker E.E. Barabási A.L. Vidal M. An empirical framework for binary interactome mapping.Nat. Methods. 2009; 6: 83-90Crossref PubMed Scopus (652) Google Scholar). High throughput techniques, such as yeast two-hybrid (Y2H) 1The abbreviations used are: ALSamyotrophic lateral sclerosisAPMSaffinity purification-mass spectrometryCNScentral nervous systemCORUMComprehensive Resource of Mammalian protein complexesGADGenetic Association DatabaseGWASgenome-wide association studiesHPRDHuman Protein Reference DatabaseKEGGKyoto Encyclopedia of Genes and GenomesOMIMOnline Mendelian Inheritance in ManQUBICQuantitative BAC InteraCtomicsSTRINGSearch Tool for the Retrieval of Interacting GenesY2Hyeast two-hybrid. (6Rual J.F. Venkatesan K. Hao T. Hirozane-Kishikawa T. Dricot A. Li N. Berriz G.F. Gibbons F.D. Dreze M. Ayivi-Guedehoussou N. Klitgord N. Simon C. Boxem M. Milstein S. Rosenberg J. Goldberg D.S. Zhang L.V. Wong S.L. Franklin G. Li S. Albala J.S. Lim J. Fraughton C. Llamosas E. Cevik S. Bex C. Lamesch P. Sikorski R.S. Vandenhaute J. Zoghbi H.Y. Smolyar A. Bosak S. Sequerra R. Doucette-Stamm L. Cusick M.E. Hill D.E. Roth F.P. Vidal M. Towards a proteome-scale map of the human protein–protein interaction network.Nature. 2005; 437: 1173-1178Crossref PubMed Scopus (2289) Google Scholar, 7Stelzl U. Worm U. Lalowski M. Haenig C. Brembeck F.H. Goehler H. Stroedicke M. Zenkner M. Schoenherr A. Koeppen S. Timm J. Mintzlaff S. Abraham C. Bock N. Kietzmann S. Goedde A. Toksöz E. Droege A. Krobitsch S. Korn B. Birchmeier W. Lehrach H. Wanker E.E. A human protein-protein interaction network: a resource for annotating the proteome.Cell. 2005; 122: 957-968Abstract Full Text Full Text PDF PubMed Scopus (1878) Google Scholar) or affinity purification followed by mass spectrometry (8Hubner N.C. Bird A.W. Cox J. Splettstoesser B. Bandilla P. Poser I. Hyman A. Mann M. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions.J. Cell Biol. 2010; 189: 739-754Crossref PubMed Scopus (334) Google Scholar, 9Ewing R.M. Chu P. Elisma F. Li H. Taylor P. Climie S. McBroom-Cerajewski L. Robinson M.D. O'Connor L. Li M. Taylor R. Dharsee M. Ho Y. Heilbut A. Moore L. Zhang S. Ornatsky O. Bukhman Y.V. Ethier M. Sheng Y. Vasilescu J. Abu-Farha M. Lambert J.P. Duewel H.S. Stewart I.I. Kuehl B. Hogue K. Colwill K. Gladwish K. Muskat B. Kinach R. Adams S.L. Moran M.F. Morin G.B. Topaloglou T. Figeys D. Large-scale mapping of human protein-protein interactions by mass spectrometry.Mol. Syst. Biol. 2007; 3: 89Crossref PubMed Scopus (763) Google Scholar) are being used for the large-scale measurement of protein binding. However, those interactions, together with the protein-protein interactions measured through small-scale experiments (10Ramirez F. Schlicker A. Assenov Y. Lengauer T. Albrecht M. Computational analysis of human protein interaction networks.Proteomics. 2007; 7: 2541-2552Crossref PubMed Scopus (59) Google Scholar) only cover 52,000 interactions, i.e. less than 25% of the predicted human interactome (11Stelzl U. Wanker E.E. The value of high quality protein-protein interaction networks for systems biology.Curr. Opin. Chem. Biol. 2006; 10: 551-558Crossref PubMed Scopus (86) Google Scholar). Computational prediction of protein interactions can fill this gap until the human interactome has been fully explored using experimental techniques (12Pitre S. Alamgir M. Green J.R. Dumontier M. Dehne F. Golshani A. Computational methods for predicting protein-protein interactions.Adv. Biochem. Eng. Biotechnol. 2008; 110: 247-267PubMed Google Scholar). In addition, computational prediction can help guiding experimental screening thereby significantly shortening the time needed until reaching (nearly) complete coverage of an interactome (13Schwartz A.S. Yu J. Gardenour K.R. Finley Jr, R.L. Ideker T. Cost-effective strategies for completing the interactome.Nat. Methods. 2009; 6: 55-61Crossref PubMed Scopus (78) Google Scholar). amyotrophic lateral sclerosis affinity purification-mass spectrometry central nervous system Comprehensive Resource of Mammalian protein complexes Genetic Association Database genome-wide association studies Human Protein Reference Database Kyoto Encyclopedia of Genes and Genomes Online Mendelian Inheritance in Man Quantitative BAC InteraCtomics Search Tool for the Retrieval of Interacting Genes yeast two-hybrid. It is important to distinguish databases assembling data and reporting experimentally tested interactions from others that actually predict previously not reported interactions. We call the second type of interactions 'de novo' predictions, as these interactions have no experimental evidence through assays directly testing for binding (although there might be indirect experimental evidence, e.g. co-expression or common knock-out phenotypes). The class of databases making such de novo prediction can again be subdivided into two subtypes: those predicting functional interactions (14McDermott J. Guerquin M. Frazier Z. Chang A.N. Samudrala R. BIOVERSE: enhancements to the framework for structural, functional and contextual modeling of proteins and proteomes.Nucleic Acids Res. 2005; 33: W324-325Crossref PubMed Scopus (15) Google Scholar, 15Rhodes D.R. Tomlins S.A. Varambally S. Mahavisno V. Barrette T. Kalyana-Sundaram S. Ghosh D. Pandey A. Chinnaiyan A.M. Probabilistic model of the human protein-protein interaction network.Nat. Biotechnol. 2005; 23: 951-959Crossref PubMed Scopus (353) Google Scholar, 16Jensen L.J. Kuhn M. Stark M. Chaffron S. Creevey C. Muller J. Doerks T. Julien P. Roth A. Simonovic M. Bork P. von Mering C. STRING 8: a global view on proteins and their functional interactions in 630 organisms.Nucleic Acids Res. 2009; 37: D412-416Crossref PubMed Scopus (1919) Google Scholar) and others predicting physical association (14McDermott J. Guerquin M. Frazier Z. Chang A.N. Samudrala R. BIOVERSE: enhancements to the framework for structural, functional and contextual modeling of proteins and proteomes.Nucleic Acids Res. 2005; 33: W324-325Crossref PubMed Scopus (15) Google Scholar, 17Brown K.R. Jurisica I. Online predicted human interaction database.Bioinformatics. 2005; 21: 2076-2082Crossref PubMed Scopus (469) Google Scholar, 18Lefebvre C. Rajbhandari P. Alvarez M.J. Bandaru P. Lim W.K. Sato M. Wang K. Sumazin P. Kustagi M. Bisikirska B.C. Basso K. Beltrao P. Krogan N. Gautier J. Dalla-Favera R. Califano A. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers.Mol. Syst. Biol. 2010; 6: 377Crossref PubMed Scopus (269) Google Scholar, 19McDowall M.D. Scott M.S. Barton G.J. PIPs: human protein-protein interaction prediction database.Nucleic Acids Res. 2009; 37: D651-656Crossref PubMed Scopus (178) Google Scholar, 20Alexeyenko A. Sonnhammer E.L. Global networks of functional coupling in eukaryotes from comprehensive data integration.Genome Res. 2009; 19: 1107-1116Crossref PubMed Scopus (132) Google Scholar). A functional interaction typically just indicates membership in a common pathway, whereas physical association refers to direct or indirect binding of proteins in a stable or transient complex. Recent work has underlined the importance of distinguishing the prediction of functional from physical association (19McDowall M.D. Scott M.S. Barton G.J. PIPs: human protein-protein interaction prediction database.Nucleic Acids Res. 2009; 37: D651-656Crossref PubMed Scopus (178) Google Scholar, 20Alexeyenko A. Sonnhammer E.L. Global networks of functional coupling in eukaryotes from comprehensive data integration.Genome Res. 2009; 19: 1107-1116Crossref PubMed Scopus (132) Google Scholar, 21Qi Y. Bar-Joseph Z. Klein-Seetharaman J. Evaluation of different biological data and computational classification methods for use in protein interaction prediction.Proteins. 2006; 63: 490-500Crossref PubMed Scopus (285) Google Scholar). Knowing physical associations is important for elucidating the structure of pathways and for understanding molecular mechanisms underlying high-level phenotypes (1Beyer A. Bandyopadhyay S. Ideker T. Integrating physical and genetic maps: from genomes to interaction networks.Nat. Rev. Genet. 2007; 8: 699-710Crossref PubMed Scopus (160) Google Scholar, 4Bork P. Jensen L.J. von Mering C. Ramani A.K. Lee I. Marcotte E.M. Protein interaction networks from yeast to human.Curr. Opin. Struct. Biol. 2004; 14: 292-299Crossref PubMed Scopus (283) Google Scholar, 11Stelzl U. Wanker E.E. The value of high quality protein-protein interaction networks for systems biology.Curr. Opin. Chem. Biol. 2006; 10: 551-558Crossref PubMed Scopus (86) Google Scholar). However, only a few existing databases actually make computational predictions of physical associations of human proteins using heterogeneous types of evidence (18Lefebvre C. Rajbhandari P. Alvarez M.J. Bandaru P. Lim W.K. Sato M. Wang K. Sumazin P. Kustagi M. Bisikirska B.C. Basso K. Beltrao P. Krogan N. Gautier J. Dalla-Favera R. Califano A. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers.Mol. Syst. Biol. 2010; 6: 377Crossref PubMed Scopus (269) Google Scholar, 19McDowall M.D. Scott M.S. Barton G.J. PIPs: human protein-protein interaction prediction database.Nucleic Acids Res. 2009; 37: D651-656Crossref PubMed Scopus (178) Google Scholar, 20Alexeyenko A. Sonnhammer E.L. Global networks of functional coupling in eukaryotes from comprehensive data integration.Genome Res. 2009; 19: 1107-1116Crossref PubMed Scopus (132) Google Scholar). Here we present an approach that integrates heterogeneous biological data in order to predict and distinguish physical from functional interactions. Applying this framework to human data we were able to predict 94,009 new physical associations with high confidence (probability > 0.7, see Results for more details). We termed this map "human predicted protein interactome" (hPRINT) and validated predictions experimentally based on Y2H and AP-MS analyses. Using these complementary technologies we identified 462 new human protein interactions and we validated the high predictive power of our scoring scheme. Having established the accuracy of hPRINT, we used this interaction map for studying the physical organization of cellular processes with a specific focus on the molecular causes of neurodegenerative diseases. Our assessment of interactions between gene products that are associated with neurodegenerative diseases reveals that hPRINT can be used for prioritizing candidate genes suggested by genome-wide association studies. Using amyotrophic lateral sclerosis (ALS), Alzheimer's and Parkinson's diseases as examples we demonstrate how hPRINT can assist in the reconstruction of molecular mechanisms linking genes to pathologic phenotypes. For training and testing, we used data from the Human Protein Reference Database (HPRD) (22Keshava Prasad T.S. Goel R. Kandasamy K. Keerthikumar S. Kumar S. Mathivanan S. Telikicherla D. Raju R. Shafreen B. Venugopal A. Balakrishnan L. Marimuthu A. Banerjee S. Somanathan D.S. Sebastian A. Rani S. Ray S. Harrys Kishore C.J. Kanth S. Ahmed M. Kashyap M.K. Mohmood R. Ramachandra Y.L. Krishna V. Rahiman B.A. Mohan S. Ranganathan P. Ramabadran S. Chaerkady R. Pandey A. Human Protein Reference Database–2009 update.Nucleic Acids Res. 2009; 37: D767-772Crossref PubMed Scopus (2526) Google Scholar), the Comprehensive Resource of Mammalian protein complexes (CORUM) (23Ruepp A. Brauner B. Dunger-Kaltenbach I. Frishman G. Montrone C. Stransky M. Waegele B. Schmidt T. Doudieu O.N. Stümpflen V. Mewes H.W. CORUM: the comprehensive resource of mammalian protein complexes.Nucleic Acids Res. 2008; 36: D646-650Crossref PubMed Scopus (283) Google Scholar), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (24Kanehisa M. Goto S. Furumichi M. Tanabe M. Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs.Nucleic Acids Res. 2010; 38: D355-360Crossref PubMed Scopus (1808) Google Scholar). In order to create a data set of physically interacting genes (PHYSET, 72,450 interactions), we selected only in vivo interactions from HPRD, human interactions from CORUM, and binary and complex interactions defined in human KEGG pathways. In addition, we selected high confidence interactions reported in a previous analysis (25Bossi A. Lehner B. Tissue specificity and the human protein interaction network.Mol. Syst. Biol. 2009; 5: 260Crossref PubMed Scopus (260) Google Scholar) where each interaction is reported in at least two publications (termed CRGhigh). A data set of functionally related but not physically interacting genes (FUNSET, 412,587 interactions) was extracted from KEGG pathways. FUNSET is composed of gene pairs that are in the same pathway but are not physically interacting. Finally we generated a data set of noninteracting gene pairs (NONSET, 331,596 interactions). NONSET consists of random pairs of genes from distinct KEGG pathways that are not known to interact physically. Hence, NONSET represents interactions that are neither functionally related nor physically binding. We used 18 features to predict interactions. Five types of evidence are taken from the STRING database (version 8.2): genomic neighborhood, gene fusion, phylogenetic profile, coexpression, and text mining (16Jensen L.J. Kuhn M. Stark M. Chaffron S. Creevey C. Muller J. Doerks T. Julien P. Roth A. Simonovic M. Bork P. von Mering C. STRING 8: a global view on proteins and their functional interactions in 630 organisms.Nucleic Acids Res. 2009; 37: D412-416Crossref PubMed Scopus (1919) Google Scholar). Five additional features are generated using the GoGene tool, which annotates genes based on Gene Ontology (GO) terms and disease annotations using text mining information (including co-occurrence in publications) (26Plake C. Royer L. Winnenburg R. Hakenberg J. Schroeder M. GoGene: gene annotation in the fast lane.Nucleic Acids Res. 2009; 37: W300-304Crossref PubMed Scopus (31) Google Scholar). The features extracted with GoGene are: cellular component, molecular function, biological process, disease, co-occurrence. Next, we used presence of known binding motifs in protein sequences as a predictor for physical binding. This feature (named "domain pairs") is based on the presence of binding domains predicted by profile Hidden Markov Models (27Henschel A. Winter C. Kim W.K. Schroeder M. Using structural motif descriptors for sequence-based binding site prediction.BMC Bioinformatics. 2007; 8: S5Crossref PubMed Scopus (20) Google Scholar). Finally, we considered the topology of the STRING interaction network to predict physical interactions. We recalculated the STRING combined score after eliminating the experimental and database features in order to exclude any experimental evidence. Using the resulting STRING interaction scores we extracted seven topological features for each edge of this network: clustering coefficient, minimum spanning tree, extended minimum spanning tree, neighborhood ratio, ratio between shortest path and edge weight, local betweenness, and global betweenness. Detailed descriptions for all features can be found in supplementary material. We performed a three-class classification, namely physical, functional, and nonrelated. All the PHYSET is used as training data for physical interactions. To avoid a bias toward larger classes, we randomly sampled from FUNSET and NONSET to obtain training sets of approximately even size. A Random Forests with 1000 trees was trained (28Breiman L. Random Forests.Machine Learning. 2004; 45: 5-32Crossref Scopus (66442) Google Scholar). Random Forests generates three probabilities summing up to 1 for each edge: probability of being physical (RFphys), probability of being functional (RFfun), and probability of being nonrelated (NON). This analysis was done using the Random Forests package from R (http://www.r-project.org/). The above Random Forests scores are de novo predictions of interactions because they are not based on any data originating from experimental testing of interactions. In order to integrate prior knowledge of measured interactions we combined the Random Forests scores with experimental lines of evidence using Bayesian integration (implemented in R) as described previously (29Elefsinioti A. Ackermann M. Beyer A. Accounting for redundancy when integrating gene interaction databases.PloS One. 2009; 4e7492Crossref PubMed Scopus (5) Google Scholar). This approach also accounts for correlation between individual lines of evidence. The different prediction strategies were computationally validated using cross-validation and using independent sets of known interactions. Fivefold cross-validation was performed by randomly sampling training and test sets from the pools of reference interactions. However, cross-validation might overestimate the predictive power of machine learning methods, because it does not take into account systematic differences among independently measured data. Hence, our second strategy hides one data source during the training phase and uses it for testing. Here, we used CRGhigh for independent testing, because it is not commonly used as a training set and so allowing it to be used as an independent test set for comparing all different networks. If a test interaction was reported in another source, it was removed from the training data and only used for testing. In order to analyze the cross-talk between pathways we selected all genes annotated for at least one cellular process or environmental information processing pathway in KEGG. We generated a high confidence physical interaction network of these selected genes with interactions having a Random Forests physical interaction score above 0.7. Because many genes are annotated for more than one pathway it is nontrivial to decide if a physical interaction is within or between two pathways. Two different strategies were followed for classifying interactions as "between pathway." Assume Pg1 and Pg2 are the sets of pathways for which the genes g1 and g2 are annotated. In the first strategy, we call the interaction g1 − g2 'between' if Pg1 ∩ Pg2 = Ø and we added 1|Pg1×Pg2| as cross-talk for each pair of pathways in the Cartesian product Pg1 ∩ Pg2. If Pg1 ∩ Pg2 = A ≠ Ø then we treat this as a within interaction and we added 1|A| contribution as within interaction for each pathway in A. This first approach rests on the assumption that two genes annotated for a common pathway are interacting inside that pathway. However, if genes are also annotated for different pathways the interaction may (in addition) also link those distinct pathways. Hence, in the second strategy, even if two genes share common pathways, we assumed there is cross-talk between pathways in Pg1 and Pg2. Again we add a contribution 1|Pg1×Pg2| as cross-talk for each pair in Pg1 ∩ Pg2. Note, that in contrast to the first strategy, it is possible to have pairs (x, x) in this Cartesian product since Pg1 ∩ Pg2 is not necessarily empty. Such a pair was assumed as within interaction in pathway x. At the end, for each strategy, we generated a N × N matrix showing the cross-talk between N pathways. We carried out the same analysis for cellular compartments. The only difference is that, instead of KEGG pathways, we used genes that have a cellular localization annotation in the generic version of GO slim (http://www.geneontology.org). Cytoscape was used for drawing the networks (30Cline M.S. Smoot M. Cerami E. Kuchinsky A. Landys N. Workman C. Christmas R. Avila-Campilo I. Creech M. Gross B. Hanspers K. Isserlin R. Kelley R. Killcoyne S. Lotia S. Maere S. Morris J. Ono K. Pavlovic V. Pico A.R. Vailaya A. Wang P.L. Adler A. Conklin B.R. Hood L. Kuiper M. Sander C. Schmulevich I. Schwikowski B. Warner G.J. Ideker T. Bader G.D. Integration of biological networks and gene expression data using Cytoscape.Nat. Protoc. 2007; 2: 2366-2382Crossref PubMed Scopus (1829) Google Scholar). Genes potentially related to ALS, Parkinson, Huntington, or Alzheimer diseases were selected using three data sources, Online Mendelian Inheritance in Man (OMIM) (http://www.ncbi.nlm.nih.gov/omim/, downloaded 28/10/10), KEGG (24Kanehisa M. Goto S. Furumichi M. Tanabe M. Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs.Nucleic Acids Res. 2010; 38: D355-360Crossref PubMed Scopus (1808) Google Scholar), and Genetic Association Database (GAD) (31Becker K.G. Barnes K.C. Bright T.J. Wang S.A. The genetic association database.Nat. Genet. 2004; 36: 431-432Crossref PubMed Scopus (760) Google Scholar). From OMIM we selected genes that are known to be related with these diseases; for achieving maximal stringency we only selected genes from OMIM class 3: their mutations were positioned by mapping the wild-type gene and a mutation in that gene created a phenotype that is in association with the disorder. GAD contains results from Genome Wide Association Studies (GWAS) and linkage studies. We selected genes from GAD that show positive association with the diseases. From KEGG we selected all genes participating in the respective disease pathways. The union of all of these genes resulted in 433 nonredundant genes (Entrez Gene IDs). We calculated functional enrichment (based on GO) of genes interacting with known disease associated genes (OMIM) or candidate genes (GWAS) using Fisher's exact test. The purpose was to show that "linker genes" lying between GWAS and OMIM genes are enriched for specific molecular functions that are different from other genes neighboring OMIM genes. Hence, we did not compute the functional enrichment of linker genes versus the whole genome, but versus other neighbors of OMIM genes. Thus, enrichment of linker genes was computed using as universe not the whole genome but the whole set of OMIM or GWAS gene interactors respectively (supplemental Tables S6, S7). However, using the whole genome as a universe yields similar findings especially in case of the OMIM interactors (supplemental Table S8). CNS specificity for each of the interactions is calculated via applying the Kolmogorow-Smirnov (KS) test. mRNA expression levels in various human tissues were collected from BIOGPS. For each of the 12,056 genes present in the BIOGPS we compared expression in CNS tissues and cell types versus all other tissues using the KS test. Interactions were scored by assigning the lowest p value of the two interacting genes to the edge. This is because of the fact that an interaction is present in a specific tissue only if both partners are expressed, hence it is restricted on the less promiscuous gene. Y2H experiments were performed as described previously (7Stelzl U. Worm U. Lalowski M. Haenig C. Brembeck F.H. Goehler H. Stroedicke M. Zenkner M. Schoenherr A. Koeppen S. Timm J. Mintzlaff S. Abraham C. Bock N. Kietzmann S. Goedde A. Toksöz E. Droege A. Krobitsch S. Korn B. Birchmeier W. Lehrach H. Wanker E.E. A human protein-protein interaction network: a resource for annotating the proteome.Cell. 2005; 122: 957-968Abstract Full Text Full Text PDF PubMed Scopus (1878) Google Scholar). In Brief, selected ORFs were transferred into bait (pBTM117c) and prey vectors (pACT4-DM). The L40ccU2 MATa yeast strain was transformed with the bait plasmids and preys were used to transform MATalpha strain L40ccα (32Goehler H. Lalowski M. Stelzl U. Waelter S. Stroedicke M. Worm U. Droege A. Lindenberg K.S. Knoblich M. Haenig C. Herbst M. Suopanki J. Scherzinger E. Abraham C. Bauer B. Hasenbank R. Fritzsche A. Ludewig A.H. Büssow K. Buessow K. Coleman S.H. Gutekunst C.A. Landwehrmeyer B.G. Lehrach H. Wanker E.E. A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease.Mol. Cell. 2004; 15: 853-865Abstract Full Text Full Text PDF PubMed Scopus (360) Google Scholar). Bait and prey yeast strains were pair wise ordered in mircotiter plate format according to hPRINT predictions and mated on YPD for 36 h. Diploid yeast were grown on S.D. media supplemented with histidine and uracil for 3 days. Interacting proteins were identified by growth on selective plates (-Leu-Trp-Ura-His) after 6 days. Random noninteracting pairs were tested by mating nonpair wise matching bait and prey plates. Every protein pair was assayed in at least two independent interaction mating experiments. Mouse or human BAC harboring the genes of interest were obtained from the BACPAC Resources Center (http://bacpac.chori.org). The N-terminal NFLAP tagging cassette as well as the C-terminal LAP and DIGtag tagging cassettes were PCR amplified using primers that carry 50 nucleotides of homology to the N- or C terminus, respectively, of each of the target genes. Recombineering and stable transfection of the modified BAC was performed as described (33Po

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Large-scale De Novo Prediction of Physical Protein-Protein Association