Phylogenetic Profiling for Probing the Modular Architecture of the Human Genome
2015; Elsevier BV; Volume: 1; Issue: 2 Linguagem: Inglês
10.1016/j.cels.2015.08.006
ISSN2639-5460
Autores Tópico(s)Microbial Metabolic Engineering and Bioproduction
ResumoInformation about functional connections between genes can be derived from patterns of coupled loss of their homologs across multiple species. This comparative approach, termed phylogenetic profiling, has been successfully used to infer genetic interactions in bacteria and eukaryotes. Rapid progress in sequencing eukaryotic species has enabled the recent phylogenetic profiling of the human genome, resulting in systematic functional predictions for uncharacterized human genes. Importantly, groups of co-evolving genes reveal widespread modularity in the underlying genetic network, facilitating experimental analyses in human cells as well as comparative studies of conserved functional modules across species. This strategy is particularly successful in identifying novel metabolic proteins and components of multi-protein complexes. The targeted sequencing of additional key eukaryotes and the incorporation of improved methods to generate and compare phylogenetic profiles will further boost the predictive power and utility of this evolutionary approach to the functional analysis of gene interaction networks. Information about functional connections between genes can be derived from patterns of coupled loss of their homologs across multiple species. This comparative approach, termed phylogenetic profiling, has been successfully used to infer genetic interactions in bacteria and eukaryotes. Rapid progress in sequencing eukaryotic species has enabled the recent phylogenetic profiling of the human genome, resulting in systematic functional predictions for uncharacterized human genes. Importantly, groups of co-evolving genes reveal widespread modularity in the underlying genetic network, facilitating experimental analyses in human cells as well as comparative studies of conserved functional modules across species. This strategy is particularly successful in identifying novel metabolic proteins and components of multi-protein complexes. The targeted sequencing of additional key eukaryotes and the incorporation of improved methods to generate and compare phylogenetic profiles will further boost the predictive power and utility of this evolutionary approach to the functional analysis of gene interaction networks. Significant similarity between two DNA or amino acid sequences is used to infer shared ancestry, or homology, of the DNA elements or proteins being compared. A high degree of sequence similarity between homologs strongly indicates a conserved biological function, a cornerstone of comparative genomics that has been used to provisionally assign functions to thousands of human genes based on decades of detailed experiments in vertebrate and invertebrate model systems. However, the differences in sequence between (or the complete loss of) homologs evolving independently in separate lineages encode information as well: a close functional coupling between unrelated genes (or non-coding genetic elements) often manifests itself in correlated patterns of sequence similarity across species (de Juan et al., 2013de Juan D. Pazos F. Valencia A. Emerging methods in protein co-evolution.Nat. Rev. Genet. 2013; 14: 249-261Crossref PubMed Scopus (422) Google Scholar), a fact that can be exploited to discover novel functional links. Specifically, the inference of functional connections between protein-coding genes based on shared binary patterns of homolog presence and loss is termed phylogenetic profiling (Figure 1A) (Pellegrini et al., 1999Pellegrini M. Marcotte E.M. Thompson M.J. Eisenberg D. Yeates T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.Proc. Natl. Acad. Sci. USA. 1999; 96: 4285-4288Crossref PubMed Scopus (1456) Google Scholar). Phylogenetic profiling exploits a specific evolutionary scenario, namely one in which a pair or larger group of genes are functionally coupled in such a way that the loss of one component leads directly or indirectly to the loss of the others (Figure 1A). While this scenario can only apply to a subset of all possible genetic interactions, a close correlation between binary phylogenetic profiles is frequently associated with them being part of the same physical protein complex, metabolic cascade, or regulatory module (Pellegrini, 2012Pellegrini M. Using phylogenetic profiles to predict functional relationships.Methods Mol. Biol. 2012; 804: 167-177Crossref PubMed Scopus (26) Google Scholar), providing a powerful approach to predict functions for unknown genes and define interdependent genetic modules. This perspective focuses primarily on novel functional insights that can be gained by phylogenetic profiling applied to the human genome (for a general overview of comparative genomics, see Alföldi and Lindblad-Toh, 2013Alföldi J. Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease.Genome Res. 2013; 23: 1063-1068Crossref PubMed Scopus (98) Google Scholar). Phylogenetic profiling already showed much promise as a predictive tool during its first application to bacterial gene sets just before the end of the millennium (Pellegrini et al., 1999Pellegrini M. Marcotte E.M. Thompson M.J. Eisenberg D. Yeates T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.Proc. Natl. Acad. Sci. USA. 1999; 96: 4285-4288Crossref PubMed Scopus (1456) Google Scholar), but the first fully sequenced genome for a multicellular eukaryote had only just been released (C. elegans Sequencing Consortium, 1998C. elegans Sequencing ConsortiumGenome Sequence of the Nematode C. elegans: A Platform for Investigating Biology.Science. 1998; 282: 2012-2018Crossref PubMed Scopus (3603) Google Scholar). The 15 years since have seen an unprecedented increase in the number of eukaryotic genomes driven by plummeting sequencing costs. This led to the successful application of phylogenetic profiling to a genome-wide analysis of S. cerevisiae (Marcotte et al., 1999Marcotte E.M. Pellegrini M. Thompson M.J. Yeates T.O. Eisenberg D. A combined algorithm for genome-wide prediction of protein function.Nature. 1999; 402: 83-86Crossref PubMed Scopus (745) Google Scholar), the discovery of novel Drosophila cilia genes (Avidor-Reiss et al., 2004Avidor-Reiss T. Maer A.M. Koundakjian E. Polyanovsky A. Keil T. Subramaniam S. Zuker C.S. Decoding cilia function: defining specialized genes required for compartmentalized cilia biogenesis.Cell. 2004; 117: 527-539Abstract Full Text Full Text PDF PubMed Scopus (423) Google Scholar), a screen for novel small RNA pathway components in C. elegans (Tabach et al., 2013aTabach Y. Billi A.C. Hayes G.D. Newman M.A. Zuk O. Gabel H. Kamath R. Yacoby K. Chapman B. Garcia S.M. et al.Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.Nature. 2013; 493: 694-698Crossref PubMed Scopus (106) Google Scholar), and the identification of multiple components of a key mitochondrial uniporter (Baughman et al., 2011Baughman J.M. Perocchi F. Girgis H.S. Plovanich M. Belcher-Timme C.A. Sancak Y. Bao X.R. Strittmatter L. Goldberger O. Bogorad R.L. et al.Integrative genomics identifies MCU as an essential component of the mitochondrial calcium uniporter.Nature. 2011; 476: 341-345Crossref PubMed Scopus (1340) Google Scholar, De Stefani et al., 2011De Stefani D. Raffaello A. Teardo E. Szabò I. Rizzuto R. A forty-kilodalton protein of the inner membrane is the mitochondrial calcium uniporter.Nature. 2011; 476: 336-340Crossref PubMed Scopus (1357) Google Scholar). In recognition of the method's utility, web servers for comprehensive phylogenetic profiling continue to be developed (Cheng and Perocchi, 2015Cheng Y. Perocchi F. ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling.Nucleic Acids Res. 2015; 43: W160-W168Crossref PubMed Scopus (22) Google Scholar) and coevolution metrics have been incorporated in some major interactome databases (von Mering et al., 2005von Mering C. Jensen L.J. Snel B. Hooper S.D. Krupp M. Foglierini M. Jouffre N. Huynen M.A. Bork P. STRING: known and predicted protein-protein associations, integrated and transferred across organisms.Nucleic Acids Res. 2005; 33: D433-D437Crossref PubMed Scopus (1146) Google Scholar, Szklarczyk et al., 2015Szklarczyk D. Franceschini A. Wyder S. Forslund K. Heller D. Huerta-Cepas J. Simonovic M. Roth A. Santos A. Tsafou K.P. et al.STRING v10: protein-protein interaction networks, integrated over the tree of life.Nucleic Acids Res. 2015; 43: D447-D452Crossref PubMed Scopus (6709) Google Scholar). Three recent studies have systematically investigated the utility of phylogenetic profiling in revealing genetic interactions between human genes (Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling.Cell Rep. 2015; 10: 993-1006Abstract Full Text Full Text PDF Scopus (43) Google Scholar, Li et al., 2014Li Y. Calvo S.E. Gutman R. Liu J.S. Mootha V.K. Expansion of biological pathways based on evolutionary inference.Cell. 2014; 158: 213-225Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar, Tabach et al., 2013bTabach Y. Golan T. Hernández-Hernández A. Messer A.R. Fukuda T. Kouznetsova A. Liu J.-G. Lilienthal I. Levy C. Ruvkun G. Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling.Mol. Syst. Biol. 2013; 9: 692Crossref PubMed Scopus (39) Google Scholar). Tabach et al., 2013bTabach Y. Golan T. Hernández-Hernández A. Messer A.R. Fukuda T. Kouznetsova A. Liu J.-G. Lilienthal I. Levy C. Ruvkun G. Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling.Mol. Syst. Biol. 2013; 9: 692Crossref PubMed Scopus (39) Google Scholar mapped hundreds of co-evolving human gene sets (identified using correlated homology scores) and disease annotations, a valuable dataset subsequently utilized to identify novel components of the mammalian meiotic methylation program (Schwartz et al., 2013Schwartz S. Agarwala S.D. Mumbach M.R. Jovanovic M. Mertins P. Shishkin A. Tabach Y. Mikkelsen T.S. Satija R. Ruvkun G. et al.High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis.Cell. 2013; 155: 1409-1421Abstract Full Text Full Text PDF PubMed Scopus (439) Google Scholar). Li et al., 2014Li Y. Calvo S.E. Gutman R. Liu J.S. Mootha V.K. Expansion of biological pathways based on evolutionary inference.Cell. 2014; 158: 213-225Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar used statistical inference to expand groups of correlated human phylogenetic profiles into larger modules, generating predictions for approximately 150 cellular pathways and complexes. A recent approach taken by our group extended phylogenetic profiling to "orthogroups" of homologous human genes and calculated a genome-wide matrix of all pairwise co-evolution scores, identifying a much larger set of modules (Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling.Cell Rep. 2015; 10: 993-1006Abstract Full Text Full Text PDF Scopus (43) Google Scholar). Experiments in our study as well as subsequent studies have validated a subset of functional predictions related to primary cilium function and novel interactors of the WASH complex (Phillips-Krawczak et al., 2015Phillips-Krawczak C.A. Singla A. Starokadomskyy P. Deng Z. Osborne D.G. Li H. Dick C.J. Gomez T.S. Koenecke M. Zhang J.-S. et al.COMMD1 is linked to the WASH complex and regulates endosomal trafficking of the copper transporter ATP7A.Mol. Biol. Cell. 2015; 26: 91-103Crossref PubMed Scopus (136) Google Scholar). The success of these studies in driving empirical discovery is of particular relevance to biomedical science given the large proportion of the human protein-coding genome that remains poorly characterized (Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling.Cell Rep. 2015; 10: 993-1006Abstract Full Text Full Text PDF Scopus (43) Google Scholar). This article focuses on how to build on these recent successes and effectively leverage the growing pool of available genome sequences. We argue that sequencing more free-living protists is a vital step in the accurate reconstruction of eukaryotic gene histories. We discuss a role for phylogenetic profiling in the investigation of human cellular function through comparative biology. Finally, we examine the modular architecture retained for some, but not all, cellular processes across diverse ecological and cellular niches through millions of years of eukaryotic evolution. Generating a phylogenetic profile for a human gene involves first identifying its orthologs in other species (homologs derived vertically from a common ancestor and expected to share the same function [Koonin, 2005Koonin E.V. Orthologs, paralogs, and evolutionary genomics.Annu. Rev. Genet. 2005; 39: 309-338Crossref PubMed Scopus (806) Google Scholar]). Orthology inference is a mature field, with a large number of graph-based (clustering based on sequence similarity scores, e.g., BLAST) and tree-based (reconciliation of gene trees inferred from sequence similarity with the species tree) algorithms (Huerta-Cepas et al., 2014Huerta-Cepas J. Capella-Gutiérrez S. Pryszcz L.P. Marcet-Houben M. Gabaldón T. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome.Nucleic Acids Res. 2014; 42: D897-D902Crossref PubMed Scopus (164) Google Scholar, Li et al., 2003Li L. Stoeckert Jr., C.J. Roos D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes.Genome Res. 2003; 13: 2178-2189Crossref PubMed Scopus (4120) Google Scholar, Powell et al., 2014Powell S. Forslund K. Szklarczyk D. Trachana K. Roth A. Huerta-Cepas J. Gabaldón T. Rattei T. Creevey C. Kuhn M. et al.eggNOG v4.0: nested orthology inference across 3686 organisms.Nucleic Acids Res. 2014; 42: D231-D239Crossref PubMed Scopus (395) Google Scholar, Schreiber et al., 2014Schreiber F. Patricio M. Muffato M. Pignatelli M. Bateman A. TreeFam v9: a new website, more species and orthology-on-the-fly.Nucleic Acids Res. 2014; 42: D922-D925Crossref PubMed Scopus (92) Google Scholar, Tatusov et al., 1997Tatusov R.L. Koonin E.V. Lipman D.J. A genomic perspective on protein families.Science. 1997; 278: 631-637Crossref PubMed Scopus (2750) Google Scholar, Vilella et al., 2009Vilella A.J. Severin J. Ureta-Vidal A. Heng L. Durbin R. Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.Genome Res. 2009; 19: 327-335Crossref PubMed Scopus (841) Google Scholar). Even straightforward graph-based methods like the best bidirectional hit (BBH; orthology is assigned if the top-scoring homolog in a second species returns the original query gene in a reciprocal similarity search) sometimes outperform more complex tree-based approaches in comparative analyses (Kristensen et al., 2011Kristensen D.M. Wolf Y.I. Mushegian A.R. Koonin E.V. Computational methods for Gene Orthology inference.Brief. Bioinform. 2011; 12: 379-391Crossref PubMed Scopus (154) Google Scholar, Trachana et al., 2011Trachana K. Larsson T.A. Powell S. Chen W.-H. Doerks T. Muller J. Bork P. Orthology prediction methods: a quality assessment using curated protein families.BioEssays. 2011; 33: 769-780Crossref PubMed Scopus (93) Google Scholar). Moreover, it should be noted that incomplete genome annotation and low homology scores at large evolutionary distances generate algorithm-independent errors; the latter can be partially addressed by using sensitive search methods like PSI-BLAST (Altschul et al., 1997Altschul S.F. Madden T.L. Schäffer A.A. Zhang J. Zhang Z. Miller W. Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res. 1997; 25: 3389-3402Crossref PubMed Scopus (59759) Google Scholar) or delta-BLAST (Boratyn et al., 2012Boratyn G.M. Schäffer A.A. Agarwala R. Altschul S.F. Lipman D.J. Madden T.L. Domain enhanced lookup time accelerated BLAST.Biol. Direct. 2012; 7: 12Crossref PubMed Scopus (512) Google Scholar) that leverage additional information derived from conserved domains or secondary structure. The scalability and easy implementation of graph-based approaches make them attractive for phylogenetic profiling, with some studies directly using homology thresholds (Li et al., 2014Li Y. Calvo S.E. Gutman R. Liu J.S. Mootha V.K. Expansion of biological pathways based on evolutionary inference.Cell. 2014; 158: 213-225Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar). However, even a single gene duplication can introduce a conceptual challenge: now, some species only carry a single gene with homology to two separate human genes (Figure 1B). Each time a gene is duplicated, the daughter genes, now capable of evolving independently, can diverge by acquiring new functions (neofunctionalization) or sharing the function of the parent (subfunctionalization) (Conant and Wagner, 2003Conant G.C. Wagner A. Asymmetric sequence divergence of duplicate genes.Genome Res. 2003; 13: 2052-2058Crossref PubMed Scopus (184) Google Scholar, Conant and Wolfe, 2008Conant G.C. Wolfe K.H. Turning a hobby into a job: how duplicated genes find new functions.Nat. Rev. Genet. 2008; 9: 938-950Crossref PubMed Scopus (815) Google Scholar). Thus, neither daughter gene is (by itself) a true functional ortholog of the non-duplicated gene found in lineages that branched off prior to the duplication event. Problematically, using homology thresholds will generate near-identical phylogenetic profiles for both daughter genes despite their possible functional independence (Figure 1B), and the BBH criterion can cause mismatches in species that branched off before the duplication event (Figure 1B) (Dalquen and Dessimoz, 2013Dalquen D.A. Dessimoz C. Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals.Genome Biol. Evol. 2013; 5: 1800-1806Crossref PubMed Scopus (54) Google Scholar). While this challenge can be circumvented by eliminating all human genes with detectable human homologs (co-orthologs) from the analyzed set (Li et al., 2014Li Y. Calvo S.E. Gutman R. Liu J.S. Mootha V.K. Expansion of biological pathways based on evolutionary inference.Cell. 2014; 158: 213-225Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar, Tabach et al., 2013bTabach Y. Golan T. Hernández-Hernández A. Messer A.R. Fukuda T. Kouznetsova A. Liu J.-G. Lilienthal I. Levy C. Ruvkun G. Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling.Mol. Syst. Biol. 2013; 9: 692Crossref PubMed Scopus (39) Google Scholar), this represents only a partial solution because an overwhelming fraction of human genes are derived from historical duplication events (Cotton and Page, 2005Cotton J.A. Page R.D.M. Rates and patterns of gene duplication and loss in the human genome.Proc. Biol. Sci. 2005; 272: 277-283Crossref PubMed Scopus (53) Google Scholar, Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling.Cell Rep. 2015; 10: 993-1006Abstract Full Text Full Text PDF Scopus (43) Google Scholar). First, the vertebrate lineage carries clear signatures of two genome-wide duplications (Blomme et al., 2006Blomme T. Vandepoele K. De Bodt S. Simillion C. Maere S. Van de Peer Y. The gain and loss of genes during 600 million years of vertebrate evolution.Genome Biol. 2006; 7: R43Crossref PubMed Scopus (275) Google Scholar). Second, many human gene families of fundamental importance to cell biology have a demonstrated history of broad expansion coupled with functional divergence (Gu et al., 2002Gu X. Wang Y. Gu J. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution.Nat. Genet. 2002; 31: 205-209Crossref PubMed Scopus (212) Google Scholar, Lespinet et al., 2002Lespinet O. Wolf Y.I. Koonin E.V. Aravind L. The role of lineage-specific gene family expansion in the evolution of eukaryotes.Genome Res. 2002; 12: 1048-1059Crossref PubMed Scopus (350) Google Scholar): GPCRs (Bjarnadóttir et al., 2006Bjarnadóttir T.K. Gloriam D.E. Hellstrand S.H. Kristiansson H. Fredriksson R. Schiöth H.B. Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse.Genomics. 2006; 88: 263-273Crossref PubMed Scopus (307) Google Scholar), small GTPases (Boureux et al., 2007Boureux A. Vignal E. Faure S. Fort P. Evolution of the Rho family of ras-like GTPases in eukaryotes.Mol. Biol. Evol. 2007; 24: 203-216Crossref PubMed Scopus (316) Google Scholar), and kinases (Shiu and Li, 2004Shiu S.-H. Li W.-H. Origins, lineage-specific expansions, and multiple losses of tyrosine kinases in eukaryotes.Mol. Biol. Evol. 2004; 21: 828-840Crossref PubMed Scopus (61) Google Scholar), to name just a few. A more inclusive solution is to sequentially group co-orthologs in the same genome into orthogroups (Figure 2A). Each orthogroup represents the extent of sequence space (and implied functionality) that the daughter genes have explored after duplication. Other genomes can then be queried for a reciprocal match to any of the co-orthologs within the group (Figure 2A). Consequently, methods that generate a separate phylogenetic profile for each orthogroup (Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling.Cell Rep. 2015; 10: 993-1006Abstract Full Text Full Text PDF Scopus (43) Google Scholar, Wapinski et al., 2007Wapinski I. Pfeffer A. Friedman N. Regev A. Natural history and evolutionary principles of gene duplication in fungi.Nature. 2007; 449: 54-61Crossref PubMed Scopus (494) Google Scholar) enable a comprehensive exploration of the functional prediction space without excluding gene families from analysis. In principle, since independent losses in multiple lineages are good indicators of functional co-evolution (Figure 2B; Case 2 represents a higher likelihood of functional co-evolution than case 1), the most rigorous way to compare phylogenetic profiles involves modeling gene gains and losses on each branch of the complete species tree. Parsimony and maximum likelihood methods have been used successfully in the past for small numbers of bacterial and fungal genomes (Barker and Pagel, 2005Barker D. Pagel M. Predicting functional gene links from phylogenetic-statistical analyses of whole genomes.PLoS Comput. Biol. 2005; 1: e3Crossref PubMed Scopus (131) Google Scholar, Barker et al., 2007Barker D. Meade A. Pagel M. Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes.Bioinformatics. 2007; 23: 14-20Crossref PubMed Scopus (70) Google Scholar). Most recently, Li et al., 2014Li Y. Calvo S.E. Gutman R. Liu J.S. Mootha V.K. Expansion of biological pathways based on evolutionary inference.Cell. 2014; 158: 213-225Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar developed an algorithm to generate statistical models for gene gain and loss from pre-selected seed groups already annotated to be part of the same pathway and search the human genome for additional genes conforming to the model. Though statistically rigorous, their approach relies on pre-existing pathway annotations and is insensitive to co-evolution at the scale of individual gene pairs, making it unsuitable in its current form for an unbiased genome-wide analysis in humans. The alternative is to use a heuristic score, which comes with the advantages of rapid optimization against functional interaction resources and the ability to scale with both genome complexity and the number of genomes. Unfortunately, correlation scores that give each species equal weight produce artifacts, as a single gene loss event can result in drastically different ortholog distributions depending on where it occurs within the tree (In Figure 2B, compare Case 1, where a single-loss event produces 5 missing orthologs, with Case 2, where single-loss events produce 1 or 2 missing orthologs) (Kensche et al., 2008Kensche P.R. van Noort V. Dutilh B.E. Huynen M.A. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution.J. R. Soc. Interface. 2008; 5: 151-170Crossref PubMed Scopus (67) Google Scholar). This effect can be partially neutralized by sampling an even distribution of species (Tabach et al., 2013aTabach Y. Billi A.C. Hayes G.D. Newman M.A. Zuk O. Gabel H. Kamath R. Yacoby K. Chapman B. Garcia S.M. et al.Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.Nature. 2013; 493: 694-698Crossref PubMed Scopus (106) Google Scholar), though with the caveat of assuming a uniform probability of gene gain/loss across lineages that encounter widely varying ecological niches and selective pressures. One effective strategy that combines the strengths of both approaches listed above involves using shared "runs" (Cokus et al., 2007Cokus S. Mizutani S. Pellegrini M. An improved method for identifying functionally linked proteins using phylogenetic profiles.BMC Bioinformatics. 2007; 8: S7Crossref PubMed Scopus (40) Google Scholar) or transitions (Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling.Cell Rep. 2015; 10: 993-1006Abstract Full Text Full Text PDF Scopus (43) Google Scholar) in phylogenetic profiles to indicate independent loss events (Figure 2B). These scoring schemes incorporate information from the species tree without requiring full models of gain and loss, making them easy to optimize and scale up to thousands of genes across hundreds of species. Drawing inspiration from tree-based methods, further heuristic constraints derived from evolutionary logic and parsimony (penalties for unlikely losses and down-weighting the influence of parasite genomes, for example) could reduce false positive rates and increase the sensitivity of predictions. As more and more species get sequenced, it is increasingly clear that almost a quarter of human genes can be traced to the earliest eukaryotes (Koonin, 2010Koonin E.V. The origin and early evolution of eukaryotes in the light of phylogenomics.Genome Biol. 2010; 11: 209Crossref PubMed Scopus (241) Google Scholar) and have since been lost in many plant, fungal, and parasitic protist lineages. This number was initially underestimated, largely because many supposedly early-branching species in a "crown-group" model of the eukaryotic tree were assigned erroneous positions caused by fast rates of genome evolution and parasitic lifestyles (Stiller and Hall, 1999Stiller J.W. Hall B.D. Long-branch attraction and the rDNA model of early eukaryotic evolution.Mol. Biol. Evol. 1999; 16: 1270-1279Crossref PubMed Scopus (138) Google Scholar). Far from being "primitive" pre-mitochondrial organisms, parasites such as Giardia lamblia actually represent the results of reductive evolution from a complex ancestor that possessed fully functional mitochondria (Embley and Martin, 2006Embley T.M. Martin W. Eukaryotic evolution, changes and challenges.Nature. 2006; 440: 623-630Crossref PubMed Scopus (690) Google Scholar). In contrast, the genome of the recently sequenced free-living Naegleria gruberi (Fritz-Laylin et al., 2010Fritz-Laylin L.K. Prochnik S.E. Ginger M.L. Dacks J.B. Carpenter M.L. Field M.C. Kuo A. Paredez A. Chapman J. Pham J. et al.The genome of Naegleria gruberi illuminates early eukaryotic versatility.Cell. 2010; 140: 631-642Abstract Full Text Full Text PDF PubMed Scopus (330) Google Scholar, Fritz-Laylin et al., 2011Fritz-Laylin L.K. Ginger M.L. Walsh C. Dawson S.C. Fulton C. The Naegleria genome: a free-living microbial eukaryote lends unique insights into core eukaryotic cell biology.Res. Microbiol. 2011; 162: 607-618Crossref PubMed Scopus (36) Google Scholar) is much closer to that ancestral state, encoding complete actin and microtubule skeletons, complex transcriptional and signaling machinery (including GPCR, histidine kinase modules, and twice as many adenylate/guanylate cyclases as humans), as well as thousands more spliceosomal introns than its parasitic relative Trypanosoma brucei (Siegel et al., 2010Siegel T.N. Hekstra D.R. Wang X. Dewell S. Cross G.A.M. Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicing and polyadenylation sites.Nucleic Acids Res. 2010; 38: 4946-4957Crossref PubMed Scopus (220) Google Scholar). The unanticipated degree of conservation of ancient eukaryotic machines revealed by these analyses opens up new possibilities for systematic comparative biology (Box 1). Importantly, the many distinct lineages (Burki, 2014Burki F. The eukaryotic tree of life from a global phylogenomic perspective.Cold Spring Harb. Perspect. Biol. 2014; 6: a016147Crossref Scopus (237) Google Scholar) of unicellular protists represent a huge reservoir of genomic diversity that can play a major role in informing phylogenetic profiles. Figure 3 illustrates this argument by highlighting the overall contribution of ortholog losses/absences in individual species to the representative phylogenetic profiles of evolutionary modules of human genes (Figures 3A and 3B) (Dey et al., 2015Dey G. Jaimovich A. Collins S.R. Seki A. Meyer T. Systematic Discovery of Human Gene
Referência(s)