Revisão Acesso aberto Revisado por pares

The New Tree of Eukaryotes

2019; Elsevier BV; Volume: 35; Issue: 1 Linguagem: Inglês

10.1016/j.tree.2019.08.008

ISSN

1872-8383

Autores

Fabien Burki, Andrew J. Roger, Matthew W. Brown, Alastair G. B. Simpson,

Tópico(s)

Microbial Community Ecology and Physiology

Resumo

The eukaryote Tree of Life (eToL) represents the phylogeny of all eukaryotic lineages, with the vast bulk of this diversity comprising microbial 'protists'. Since the early 2000s, the eToL has been summarized in a few (five to eight) 'supergroups'. Recently, this tree has been deeply remodeled due mainly to the maturation of phylogenomics and the addition of numerous new 'kingdom-level' lineages of heterotrophic protists.The current eToL is derived almost exclusively from molecular phylogenies, in contrast to earlier models that were syntheses of molecular and other biological data.The supergroup model for the eToL has become increasingly abstract due to the absence of known shared derived characteristics for the new supergroups.Culture-based studies, not higher-throughput methods, have been responsible for most of the new major lineages recently added to the eToL. For 15 years, the eukaryote Tree of Life (eToL) has been divided into five to eight major groupings, known as 'supergroups'. However, the tree has been profoundly rearranged during this time. The new eToL results from the widespread application of phylogenomics and numerous discoveries of major lineages of eukaryotes, mostly free-living heterotrophic protists. The evidence that supports the tree has transitioned from a synthesis of molecular phylogenetics and biological characters to purely molecular phylogenetics. Most current supergroups lack defining morphological or cell-biological characteristics, making the supergroup label even more arbitrary than before. Going forward, the combination of traditional culturing with maturing culture-free approaches and phylogenomics should accelerate the process of completing and resolving the eToL at its deepest levels. For 15 years, the eukaryote Tree of Life (eToL) has been divided into five to eight major groupings, known as 'supergroups'. However, the tree has been profoundly rearranged during this time. The new eToL results from the widespread application of phylogenomics and numerous discoveries of major lineages of eukaryotes, mostly free-living heterotrophic protists. The evidence that supports the tree has transitioned from a synthesis of molecular phylogenetics and biological characters to purely molecular phylogenetics. Most current supergroups lack defining morphological or cell-biological characteristics, making the supergroup label even more arbitrary than before. Going forward, the combination of traditional culturing with maturing culture-free approaches and phylogenomics should accelerate the process of completing and resolving the eToL at its deepest levels. Resolving the evolutionary tree for all eukaryotes has been a long-standing goal in biology. Inferring an eToL that is both accurate and comprehensive is a worthwhile objective in itself, but the eToL is also the framework on which we understand the origins and history of eukaryote biology and the evolutionary processes underpinning it. It is therefore a fundamental tool for studying many aspects of eukaryote evolution, such as cell biology, genome organization, sex, and multicellularity. In the molecular era, the eToL has also become a vital resource to interpret environmental sequence data and thus reveal the diversity and composition of ecological communities. Although most of the described species of eukaryotes belong to the multicellular groups of animals (Metazoa), land plants, and fungi, it has long been clear that these three 'kingdoms' represent only a small proportion of high-level eukaryote diversity. The vast bulk of this diversity – including dozens of extant 'kingdom-level' taxa – is found within the 'protists', the eukaryotes that are not animals, plants, or fungi [1Adl S.M. et al.Revisions to the classification, nomenclature, and diversity of eukaryotes.J. Eukaryot. Microbiol. 2019; 66: 4-119Crossref PubMed Scopus (175) Google Scholar, 2O'Malley M.A. et al.The other eukaryotes in light of evolutionary protistology.Biol. Philos. 2012; 28: 299-330Crossref Scopus (14) Google Scholar, 3Betts H.C. et al.Integrated genomic and fossil evidence illuminates life's early evolution and eukaryote origin.Nat. Ecol. Evol. 2018; 2: 1556-1562Crossref PubMed Scopus (0) Google Scholar, 4del Campo J. et al.The others: our biased perspective of eukaryotic genomes.Trends Ecol. Evol. 2014; 29: 252-259Abstract Full Text Full Text PDF PubMed Scopus (75) Google Scholar, 5Burki F. The eukaryotic tree of life from a global phylogenomic perspective.Cold Spring Harb. Perspect. Biol. 2014; 6: a016147Crossref PubMed Scopus (174) Google Scholar, 6Pawlowski J. et al.CBOL Protist Working Group: barcoding eukaryotic richness beyond the animal, plant, and fungal kingdoms.PLoS Biol. 2012; 10e1001419Crossref PubMed Scopus (263) Google Scholar]. To a first approximation, inferring the eToL is to resolve the relationships among the major protist lineages. However, this task is complicated by the fact that protists are much less studied overall than animals, plants, or fungi [7Sibbald S.J. Archibald J.M. More protist genomes needed.Nat. Ecol. Evol. 2017; 1: 145Crossref PubMed Scopus (0) Google Scholar]. Molecular sequence data has accumulated slowly for many known protist taxa and numerous important lineages were completely unknown (or were not cultivated, hence challenging to study) when the molecular era began. Thus, resolving the eToL has been a process where large-scale discovery of major lineages has occurred simultaneously with deep-level phylogenetic inference. This makes the task at hand analogous to a jigsaw puzzle, but one where a large and unknown number of pieces are missing from the box and instead are hidden under various pieces of the furniture. By the early 2000s, a model of the tree emerged that divided almost all of known eukaryote diversity among five to eight major taxa usually referred to as 'supergroups' [8Simpson A.G.B. Roger A.J. Eukaryotic evolution: getting to the root of the problem.Curr. Biol. 2002; 12: R691-R693Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar, 9Simpson A.G.B. Roger A.J. The real "kingdoms" of eukaryotes.Curr. Biol. 2004; 14: R693-R696Abstract Full Text Full Text PDF PubMed Google Scholar, 10Baldauf S.L. The deep roots of eukaryotes.Science. 2003; 300: 1703-1706Crossref PubMed Scopus (572) Google Scholar, 11Adl S.M. et al.The new higher-level classification of eukaryotes with emphasis on the taxonomy of protists.J. Eukaryot. Microbiol. 2005; 52: 399-451Crossref PubMed Scopus (0) Google Scholar, 12Keeling P.J. et al.The tree of eukaryotes.Trends Ecol. Evol. 2005; 20: 670-676Abstract Full Text Full Text PDF PubMed Scopus (425) Google Scholar]. The category of supergroup was a purely informal one, denoting extremely broad assemblages that contain, for example, the traditional 'kingdoms' like Metazoa and Fungi as subclades. Thus, the original supergroups generally represented the most inclusive collections of organisms within eukaryotes for which there was reasonable evidence that they formed a monophyletic group. A typical list of these groups included (with some differences in capitalization and endings): Archaeplastida (also known as Plantae), Chromalveolata, Rhizaria (or Cercozoa), Opisthokonta, Amoebozoa, and Excavata (see Box 1 for short descriptions). The main variations between accounts from that time were that some united Opisthokonta and Amoebozoa as 'unikonts' [12Keeling P.J. et al.The tree of eukaryotes.Trends Ecol. Evol. 2005; 20: 670-676Abstract Full Text Full Text PDF PubMed Scopus (425) Google Scholar] (much later renamed 'Amorphea' [13Adl S.M. et al.The revised classification of eukaryotes.J. Eukaryot. Microbiol. 2012; 59: 429-493Crossref PubMed Scopus (912) Google Scholar]) or did not show Excavata and/or Chromalveolata confidently resolved as clades [10Baldauf S.L. The deep roots of eukaryotes.Science. 2003; 300: 1703-1706Crossref PubMed Scopus (572) Google Scholar, 11Adl S.M. et al.The new higher-level classification of eukaryotes with emphasis on the taxonomy of protists.J. Eukaryot. Microbiol. 2005; 52: 399-451Crossref PubMed Scopus (0) Google Scholar]. For half of the groups (i.e., Opisthokonta, Amoebozoa, and Rhizaria), the principal evidence supporting their unity was the phylogenies of one or a few genes [14Nikolaev S.I. et al.The twilight of Heliozoa and rise of Rhizaria, an emerging supergroup of amoeboid eukaryotes.Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 8066-8071Crossref PubMed Scopus (0) Google Scholar, 15Baldauf S.L. Palmer J.D. Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins.Proc. Natl. Acad. Sci. U. S. A. 1993; 90: 11558-11562Crossref PubMed Scopus (0) Google Scholar, 16Smirnov A. et al.Molecular phylogeny and classification of the lobose amoebae.Protist. 2005; 156: 129-142Crossref PubMed Scopus (88) Google Scholar]. For the others, it was a combination of weaker molecular phylogenetic evidence and shared derived cell-biological features. Archaeplastida and Chromalveolata were each identified by the presence of similar plastids [17Cavalier-Smith T. Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree.J. Eukaryot. Microbiol. 1999; 46: 347-366Crossref PubMed Google Scholar, 18Cavalier-Smith T. Eukaryote kingdoms: seven or nine?.Biosystems. 1981; 14: 461-481Crossref PubMed Google Scholar], with sequences from plastid genomes supporting an ancestral endosymbiotic origin of plastids in each group [19Yoon H.S. et al.The single, ancient origin of chromist plastids.Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 15507-15512Crossref PubMed Scopus (0) Google Scholar, 20Rodriguez-Ezpeleta N. et al.Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes.Curr. Biol. 2005; 15: 1325-1330Abstract Full Text Full Text PDF PubMed Scopus (377) Google Scholar]. Excavata, meanwhile, was distinguished by the inference that taxa shared a derived, complex flagellar apparatus cytoskeleton [21Simpson A.G.B. Cytoskeletal organization, phylogenetic affinities and systematics in the contentious taxon Excavata (Eukaryota).Int. J. Syst. Evol. Microbiol. 2003; 53: 1759-1777Crossref PubMed Scopus (168) Google Scholar]. Consequently, the original supergroup-based eToLs were syntheses of different information rather than straightforward summaries of molecular phylogenies.Box 1The Original Supergroups – and Where Are They Now?Five to six supergroups were originally proposed, depending on whether Opisthokonta and Amoebozoa were unified in the larger group unikonts [9Simpson A.G.B. Roger A.J. The real "kingdoms" of eukaryotes.Curr. Biol. 2004; 14: R693-R696Abstract Full Text Full Text PDF PubMed Google Scholar, 12Keeling P.J. et al.The tree of eukaryotes.Trends Ecol. Evol. 2005; 20: 670-676Abstract Full Text Full Text PDF PubMed Scopus (425) Google Scholar]. The name unikonts (based on a now-discarded hypothesis of a uniflagellated ancestor) was later replaced by Amorphea [13Adl S.M. et al.The revised classification of eukaryotes.J. Eukaryot. Microbiol. 2012; 59: 429-493Crossref PubMed Scopus (912) Google Scholar]. The six supergroups version corresponded to the following.•Opisthokonta includes animals, fungi, and several protist lineages that are most closely related to either animals or fungi. Opisthokonta remains a robust clade in modern phylogenies; however, it is nested within at least two larger taxa, Amorphea and Obazoa, that are frequently treated as supergroups instead.•Amoebozoa is also still a robust group, but now is often regarded as a member of the supergroup Amorphea. Amoebozoa includes free-living amoeboid forms with lobose pseudopodia (e.g., Amoeba) but also more filose amoebae, some flagellates, and various slime molds.•Excavata was originally proposed based on a distinctive morphology, namely a particular feeding groove form and associated cytoskeleton system, found in many enigmatic flagellated protists. Phylogenetics and phylogenomics defined three monophyletic subgroups – Discoba, Metamonada, and malawimonads – but have not consistently placed them together as a single clade. The name is now usually restricted to a Discoba–Metamonada clade (quite possibly artificial; see main text) or regarded as referring to a paraphyletic group.•Archaeplastida are distinguished by the presence of primary plastids – the photosynthetic organelles deriving directly from cyanobacteria by endosymbiosis. The three main groups with primary plastids are the green algae and land plants, red algae (and likely their recently discovered relative Rhodelphis), and glaucophyte algae. Today, Archaeplastida is generally still considered a supergroup, although most phylogenomic analyses do not strongly support its monophyly (i.e., all three host lineages forming a single clade to the exclusion of other supergroups).•Chromalveolata contained groups with red alga-derived secondary plastids (i.e., Alveolata, Stramenopila, Haptophyta, and Cryptophyta). This group was based on the assumption that these plastids were acquired once in a common ancestor, which was supported by plastid evidence but never strongly from the host perspective. Chromalveolata has been shown to be polyphyletic, with Alveolata and Stramenopila belonging to Sar (in TSAR), Haptophyta in Haptista, and Cryptophyta in Cryptista.•Rhizaria was the latest addition at the time the supergroup model was proposed. It includes a wide diversity of amoebae (e.g., foraminiferans, the radiolarians, filose testate amoebae), flagellates, various parasites, and the chlorarachniophyte algae. In contrast to all other original supergroups, which were at least partly distinguished by morphological characters, Rhizaria was inferred more or less exclusively using molecular phylogenetics. It is now part of Sar (in TSAR) along with Alveolata and Stramenopila. Five to six supergroups were originally proposed, depending on whether Opisthokonta and Amoebozoa were unified in the larger group unikonts [9Simpson A.G.B. Roger A.J. The real "kingdoms" of eukaryotes.Curr. Biol. 2004; 14: R693-R696Abstract Full Text Full Text PDF PubMed Google Scholar, 12Keeling P.J. et al.The tree of eukaryotes.Trends Ecol. Evol. 2005; 20: 670-676Abstract Full Text Full Text PDF PubMed Scopus (425) Google Scholar]. The name unikonts (based on a now-discarded hypothesis of a uniflagellated ancestor) was later replaced by Amorphea [13Adl S.M. et al.The revised classification of eukaryotes.J. Eukaryot. Microbiol. 2012; 59: 429-493Crossref PubMed Scopus (912) Google Scholar]. The six supergroups version corresponded to the following.•Opisthokonta includes animals, fungi, and several protist lineages that are most closely related to either animals or fungi. Opisthokonta remains a robust clade in modern phylogenies; however, it is nested within at least two larger taxa, Amorphea and Obazoa, that are frequently treated as supergroups instead.•Amoebozoa is also still a robust group, but now is often regarded as a member of the supergroup Amorphea. Amoebozoa includes free-living amoeboid forms with lobose pseudopodia (e.g., Amoeba) but also more filose amoebae, some flagellates, and various slime molds.•Excavata was originally proposed based on a distinctive morphology, namely a particular feeding groove form and associated cytoskeleton system, found in many enigmatic flagellated protists. Phylogenetics and phylogenomics defined three monophyletic subgroups – Discoba, Metamonada, and malawimonads – but have not consistently placed them together as a single clade. The name is now usually restricted to a Discoba–Metamonada clade (quite possibly artificial; see main text) or regarded as referring to a paraphyletic group.•Archaeplastida are distinguished by the presence of primary plastids – the photosynthetic organelles deriving directly from cyanobacteria by endosymbiosis. The three main groups with primary plastids are the green algae and land plants, red algae (and likely their recently discovered relative Rhodelphis), and glaucophyte algae. Today, Archaeplastida is generally still considered a supergroup, although most phylogenomic analyses do not strongly support its monophyly (i.e., all three host lineages forming a single clade to the exclusion of other supergroups).•Chromalveolata contained groups with red alga-derived secondary plastids (i.e., Alveolata, Stramenopila, Haptophyta, and Cryptophyta). This group was based on the assumption that these plastids were acquired once in a common ancestor, which was supported by plastid evidence but never strongly from the host perspective. Chromalveolata has been shown to be polyphyletic, with Alveolata and Stramenopila belonging to Sar (in TSAR), Haptophyta in Haptista, and Cryptophyta in Cryptista.•Rhizaria was the latest addition at the time the supergroup model was proposed. It includes a wide diversity of amoebae (e.g., foraminiferans, the radiolarians, filose testate amoebae), flagellates, various parasites, and the chlorarachniophyte algae. In contrast to all other original supergroups, which were at least partly distinguished by morphological characters, Rhizaria was inferred more or less exclusively using molecular phylogenetics. It is now part of Sar (in TSAR) along with Alveolata and Stramenopila. The supergroup model for the eToL became widely popular in both the primary literature and textbooks, for several reasons. First, the model made for convenient and efficient summaries of eukaryotes, since almost all species fell into these few relatively diverse major groups. Second, all of the original supergroups, except Rhizaria, had at least one distinctive biological characteristic that seemed to ancestrally define them (see above and Box 1). Third, the groupings seemed to coincide with the limits of phylogenetic resolution. In fact, the overarching supergroup model has remained the standard description of the eToL for 15 years, despite major changes in our knowledge of eukaryotic phylogeny and diversity over that time. The profound changes to the eToL have come from the development of phylogenomics and, concomitantly, the addition of many evolutionarily important protist lineages into molecular datasets. Below, we briefly introduce these two aspects. The term 'phylogenomics' covers various approaches combining genomic-scale data with phylogenetic methods. In the context of the eToL, it usually refers to the estimation of organismal phylogeny from datasets containing dozens to hundreds of gene alignments, most often nucleus-encoded genes analyzed as inferred amino acid sequences [22Delsuc F. et al.Phylogenomics and the reconstruction of the tree of life.Nat. Rev. Genet. 2005; 6: 361-375Crossref PubMed Scopus (738) Google Scholar]. The data are sourced from a mixture of genome and, frequently, transcriptome sequencing projects. The introduction of phylogenomics offered the promise of overcoming the limited information afforded by single genes, which were mostly inadequate to resolve deep divergences within the eToL [23Philippe H. et al.Phylogenomics of eukaryotes: impact of missing data on large alignments.Mol. Biol. Evol. 2004; 21: 1740-1752Crossref PubMed Scopus (0) Google Scholar]. However, voices warned early on that most of the analysis artefacts known to afflict single-gene phylogenies can also apply to phylogenomics [24Jeffroy O. et al.Phylogenomics: the beginning of incongruence?.Trends Genet. 2006; 22: 225-231Abstract Full Text Full Text PDF PubMed Scopus (397) Google Scholar]. Phenomena that cause unrelated taxa to cluster together in phylogenies, such as compositional bias and high rates of sequence divergence, often also affect the whole genome. Therefore, merely adding genes can amplify artefacts rather than overriding them [25Philippe H. et al.Resolving difficult phylogenetic questions: why more sequences are not enough.PLoS Biol. 2011; 9e1000602Crossref PubMed Scopus (529) Google Scholar]. Accuracy might be improved by using more realistic evolutionary models, and especially by careful choice of taxa, where this is possible (see below). Examining multiple genes also raises the specter of combining different gene histories together artificially, making careful quality controls essential to eliminate incorrect paralog assignments, contaminating sequences, etc. (see Box 2 for a typical 'phylogenomic pipeline').Box 2Example of a Phylogenomic Analysis PipelineConstruction of datasets for phylogenomics is complicated, requiring painstaking care to exclude spurious data (e.g., contaminants, paralogs) and select taxa appropriately. Deep-level phylogenomic analyses typically use inferred amino acid sequences of proteins, and sets of hundreds of widely present and/or highly expressed proteins are curated by various research groups. When new taxa are added, homologous sequences are retrieved from their transcriptomic or predicted gene sets, usually using pairwise alignment similarity tools (e.g., BlastP, Diamond-BlastP) or profile-based approaches (e.g., hidden Markov model methods like HMM-search). Typically, a series of checks are made to exclude paralogous sequences, often through reciprocal best BlastP hit to a set of manually curated orthologs. The proteins from new taxa that pass these checks are provisionally considered orthologous and are aligned with those from the hundreds of species in the curated dataset. Maximum likelihood (ML) trees are then estimated for each gene alignment, with bootstrapping to assess branch support. These trees are examined to identify and exclude sequences with apparent or actual evolutionary histories that differ from the organismal phylogeny, such as lateral gene transfers, incorrect paralog selections, and various contaminants. Contamination may occur during sequencing (referred to as on-sequencer/flow cell contamination), during library preparation, or in cell culture. These gene tree examinations currently include laborious by-eye inspections of the phylogenies, since some aspects still require human interpretation and decisions. A suitable subset of taxa is then selected for the actual analysis. This selection aims to evenly cover the relevant phylogenetic breadth while excluding problematic species (e.g., those with limited data, extreme evolutionary rates in many genes, etc.). The explosion in the number of species for which omic data are available has greatly enhanced choice in taxon selection, as well as the detection (and elimination) of nonvertical signals in the data (e.g. [54Burki F. et al.Untangling the early diversification of eukaryotes: a phylogenomic study of the evolutionary origins of Centrohelida, Haptophyta and Cryptista.Proc. Biol. Sci. 2016; 283: 20152802Crossref PubMed Scopus (117) Google Scholar, 56Brown M.W. et al.Phylogenomics places orphan protistan lineages in a novel eukaryotic super-group.Genome Biol. Evol. 2018; 10: 427-433Crossref PubMed Scopus (23) Google Scholar, 62Strassert J.F.H. et al.New phylogenomic analysis of the enigmatic phylum Telonemia further resolves the eukaryote tree of life.Mol. Biol. Evol. 2019; 36: 757-765Crossref PubMed Scopus (7) Google Scholar]).Dataset assembly is followed by the actual phylogenomic analyses, in which hundreds of genes are concatenated into a phylogenomic 'supermatrix'. Usually, both ML and Bayesian analyses are conducted. Various evolutionary models are employed, with choice often constrained by computational logistics. Site-heterogeneous models, in which the profile of substitution propensities can differ among sites in the alignment, appear to be particularly important for improved phylogenetic accuracy. These models were first implemented in the Bayesian inference platform PhyloBayes [102Lartillot N. Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process.Mol. Biol. Evol. 2004; 21: 1095-1109Crossref PubMed Scopus (830) Google Scholar], but the analyses are computationally intensive and problems with mixing and convergence are common. Recently, practical ML implementations of site-heterogeneous models have become available in IQ-Tree [103Nguyen L.-T. et al.IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.Mol. Biol. Evol. 2015; 32: 268-274Crossref PubMed Scopus (2147) Google Scholar, 104Wang H.-C. et al.Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation.Syst. Biol. 2018; 67: 216-235Crossref PubMed Scopus (46) Google Scholar]. Frequently, subsidiary analyses are conducted to test whether initial results are robust to perturbations of the data, especially excluding data most likely to foster incorrect phylogenetic inference (e.g., the fastest-evolving species, sites, or genes). Construction of datasets for phylogenomics is complicated, requiring painstaking care to exclude spurious data (e.g., contaminants, paralogs) and select taxa appropriately. Deep-level phylogenomic analyses typically use inferred amino acid sequences of proteins, and sets of hundreds of widely present and/or highly expressed proteins are curated by various research groups. When new taxa are added, homologous sequences are retrieved from their transcriptomic or predicted gene sets, usually using pairwise alignment similarity tools (e.g., BlastP, Diamond-BlastP) or profile-based approaches (e.g., hidden Markov model methods like HMM-search). Typically, a series of checks are made to exclude paralogous sequences, often through reciprocal best BlastP hit to a set of manually curated orthologs. The proteins from new taxa that pass these checks are provisionally considered orthologous and are aligned with those from the hundreds of species in the curated dataset. Maximum likelihood (ML) trees are then estimated for each gene alignment, with bootstrapping to assess branch support. These trees are examined to identify and exclude sequences with apparent or actual evolutionary histories that differ from the organismal phylogeny, such as lateral gene transfers, incorrect paralog selections, and various contaminants. Contamination may occur during sequencing (referred to as on-sequencer/flow cell contamination), during library preparation, or in cell culture. These gene tree examinations currently include laborious by-eye inspections of the phylogenies, since some aspects still require human interpretation and decisions. A suitable subset of taxa is then selected for the actual analysis. This selection aims to evenly cover the relevant phylogenetic breadth while excluding problematic species (e.g., those with limited data, extreme evolutionary rates in many genes, etc.). The explosion in the number of species for which omic data are available has greatly enhanced choice in taxon selection, as well as the detection (and elimination) of nonvertical signals in the data (e.g. [54Burki F. et al.Untangling the early diversification of eukaryotes: a phylogenomic study of the evolutionary origins of Centrohelida, Haptophyta and Cryptista.Proc. Biol. Sci. 2016; 283: 20152802Crossref PubMed Scopus (117) Google Scholar, 56Brown M.W. et al.Phylogenomics places orphan protistan lineages in a novel eukaryotic super-group.Genome Biol. Evol. 2018; 10: 427-433Crossref PubMed Scopus (23) Google Scholar, 62Strassert J.F.H. et al.New phylogenomic analysis of the enigmatic phylum Telonemia further resolves the eukaryote tree of life.Mol. Biol. Evol. 2019; 36: 757-765Crossref PubMed Scopus (7) Google Scholar]). Dataset assembly is followed by the actual phylogenomic analyses, in which hundreds of genes are concatenated into a phylogenomic 'supermatrix'. Usually, both ML and Bayesian analyses are conducted. Various evolutionary models are employed, with choice often constrained by computational logistics. Site-heterogeneous models, in which the profile of substitution propensities can differ among sites in the alignment, appear to be particularly important for improved phylogenetic accuracy. These models were first implemented in the Bayesian inference platform PhyloBayes [102Lartillot N. Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process.Mol. Biol. Evol. 2004; 21: 1095-1109Crossref PubMed Scopus (830) Google Scholar], but the analyses are computationally intensive and problems with mixing and convergence are common. Recently, practical ML implementations of site-heterogeneous models have become available in IQ-Tree [103Nguyen L.-T. et al.IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.Mol. Biol. Evol. 2015; 32: 268-274Crossref PubMed Scopus (2147) Google Scholar, 104Wang H.-C. et al.Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation.Syst. Biol. 2018; 67: 216-235Crossref PubMed Scopus (46) Google Scholar]. Frequently, subsidiary analyses are conducted to test whether initial results are robust to perturbations of the data, especially excluding data most likely to foster incorrect phylogenetic inference (e.g., the fastest-evolving species, sites, or genes). Although pioneering phylogenomic studies were instrumental in showing what could be done, they contributed only marginally to the original supergroup model, mostly because the sampling of protist taxa was extremely limited (e.g., missing entire supergroups, especially Rhizaria) [20Rodriguez-Ezpeleta N. et al.Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes.Curr. Biol. 2005; 15: 1325-1330Abstract Full Text Full Text PDF PubMed Scopus (377) Google Scholar, 26Bapteste E. et al.The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba.Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 1414-1419Crossref PubMed Scopus (0) Google Scholar, 27Lang B.F. et al.The cl

Referência(s)