Coming of age: orphan genes in plants
2014; Elsevier BV; Volume: 19; Issue: 11 Linguagem: Inglês
10.1016/j.tplants.2014.07.003
ISSN1878-4372
AutoresZebulun Arendsee, Ling Li, Eve Syrkin Wurtele,
Tópico(s)Plant Parasitism and Resistance
Resumo•Genes with no trans-species similarity (orphans) appear in all sequenced genomes.•Some orphans become established in subsequent lineages.•As orphans mature, they tend to become more complex, connected, and functional.•Many orphans function in biotic/abiotic stresses and lineage-specific traits.•Some orphans link metabolic responses to environmental changes.•Some orphans are functional when introduced into evolutionarily distant species. Sizable minorities of protein-coding genes from every sequenced eukaryotic and prokaryotic genome are unique to the species. These so-called 'orphan genes' may evolve de novo from non-coding sequence or be derived from older coding material. They are often associated with environmental stress responses and species-specific traits or regulatory patterns. However, difficulties in studying genes where comparative analysis is impossible, and a bias towards broadly conserved genes, have resulted in underappreciation of their importance. We review here the identification, possible origins, evolutionary trends, and functions of orphans with an emphasis on their role in plant biology. We exemplify several evolutionary trends with an analysis of Arabidopsis thaliana and present QQS as a model orphan gene. Sizable minorities of protein-coding genes from every sequenced eukaryotic and prokaryotic genome are unique to the species. These so-called 'orphan genes' may evolve de novo from non-coding sequence or be derived from older coding material. They are often associated with environmental stress responses and species-specific traits or regulatory patterns. However, difficulties in studying genes where comparative analysis is impossible, and a bias towards broadly conserved genes, have resulted in underappreciation of their importance. We review here the identification, possible origins, evolutionary trends, and functions of orphans with an emphasis on their role in plant biology. We exemplify several evolutionary trends with an analysis of Arabidopsis thaliana and present QQS as a model orphan gene. Until the past few years the consensus was that new genes arise via combinations of processes such as duplication, fusion, fission, and transposition of existing protein-coding genes. Fischer and Eisenberg noticed that all sequenced bacteria contained genes without detectable homologs in any sequenced relative [1Fischer D. Eisenberg D. Finding families for genomic ORFans.Bioinformatics. 1999; 15: 759-762Google Scholar]. They postulated this uniqueness was a real phenomenon, rather than an artifact of poor annotation or sparse sequencing among nearby species, as some claimed [2Casari G. et al.Bioinformatics and the discovery of gene function.Trends Genet. 1996; 12: 244-245Google Scholar]. Since the advent of next-generation sequencing, the analysis of a multitude of genomes has shown that such orphan genes are widespread across all domains of life [3Khalturin K. et al.More than just orphans: are taxonomically-restricted genes important in evolution?.Trends Genet. 2009; 25: 404-413Google Scholar, 4Gollery M. et al.What makes species unique? The contribution of proteins with obscure features.Genome Biol. 2006; 7: R57Google Scholar, 5Wilson G.A. et al.Orphans as taxonomically restricted and ecologically important genes.Microbiology. 2005; 151: 2499-2501Google Scholar] and viruses [6Yin Y. Fischer D. Identification and investigation of ORFans in the viral world.BMC Genomics. 2008; 9: 24Google Scholar]. Continual genesis of novel genes in an organism where protein count is fairly constant implies equilibrium between gene origin and extinction. A reasonable hypothesis is that most of the turnover occurs in the youngest genes [7Tautz D. Domazet-Lošo T. The evolutionary origin of orphan genes.Nat. Rev. Genet. 2011; 12: 692-702Google Scholar]. This has been demonstrated in Drosophila [8Palmieri N. et al.The life cycle of Drosophila orphan genes.eLife. 2014; 3: e01311Google Scholar] and is reflected in the degree of conservation of existing genes in each genome. Under this model there is a vast, dynamic reservoir of novel genes. We will discuss: (i) the origin of orphans and their regulatory elements, (ii) their maturation into established genes, and (iii) the functions into which they are recruited. Orphans may be defined as genes with coding sequences utterly unique to the species; in other words, genes that produce previously non-existing (novel) proteins. They are a subset of taxonomically restricted (also called lineage-specific) genes that are specific to a particular taxon (e.g., malvid-specific or Brassicaceae-specific genes). Genes are generally classified as being orphans if they lack coding-sequence similarity outside their species (usually quantified by BLAST). This classification method accepts as orphans, genes that are newly born from non-genic sequence, as well as descendants of ancient genes whose coding sequences have changed beyond recognition; it rejects horizontally transferred genes and duplicated genes that may have assumed a new function but whose proteins are still recognizable (i.e., 'new' genes that are not orphans). Analysis of the genomic contexts and sequences of orphan genes can often reveal their origins, as reviewed in [7Tautz D. Domazet-Lošo T. The evolutionary origin of orphan genes.Nat. Rev. Genet. 2011; 12: 692-702Google Scholar]. Some can be traced to highly divergent products of gene duplications, overlapping or anti-sense reading frames (overprinting), domesticated transposons, resurrected pseudogenes, or early frameshift mutations [8Palmieri N. et al.The life cycle of Drosophila orphan genes.eLife. 2014; 3: e01311Google Scholar, 9Donoghue M.T. et al.Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.BMC Evol. Biol. 2011; 11: 47Google Scholar, 10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar, 11Neme R. Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.BMC Genomics. 2013; 14: 117Google Scholar, 12Brosch M. et al.Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and 'resurrected' pseudogenes in the mouse genome.Genome Res. 2011; 21: 756-767Google Scholar]. Others may arise de novo from non-coding sequence. Early doubts that protein-coding genes could spontaneously arise [13Jacob F. Evolution and tinkering.Science. 1977; 196: 1161-1166Google Scholar] have been put to rest by a flood of papers tracing orphans to their non-genic roots [14Levine M.T. et al.Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression.Proc. Natl. Acad. Sci. U.S.A. 2006; 103: 9935-9939Google Scholar, 15Begun D.J. et al.Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade.Genetics. 2006; 176: 1131-1137Google Scholar, 16Cai J. et al.De novo origination of a new protein-coding gene in Saccharomyces cerevisiae.Genetics. 2008; 179: 487-496Google Scholar, 17Li D. et al.A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand.Cell Res. 2010; 20: 408-420Google Scholar, 18Li C-Y. et al.A human-specific de novo protein-coding gene associated with human brain functions.PLoS Comput. Biol. 2010; 6: e1000734Google Scholar, 19Knowles D.G. McLysaght A. Recent de novo origin of human protein-coding genes.Genome Res. 2009; 19: 1752-1759Google Scholar, 20Heinen T.J.A.J. et al.Emergence of a new gene from an intergenic region.Curr. Biol. 2009; 19: 1527-1531Google Scholar, 21Murphy D.N. McLysaght A. De novo origin of protein-coding genes in murine rodents.PLoS ONE. 2012; 7: e48650Google Scholar, 22Yang Z. Huang J. De novo origin of new genes with introns in Plasmodium vivax.FEBS Lett. 2011; 585: 641-644Google Scholar, 23Zhao L. et al.Origin and spread of de novo genes in Drosophila melanogaster populations.Science. 2014; 343: 769-772Google Scholar]. A recent study suggests a continuum between very weakly transcribed and translated open reading frames (ORFs) and highly functional, mature genes [24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar]. Table 2 from [10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar] shows a cross-species, quantitative comparison of the origins of orphan genes. In A. thaliana, over half of the orphans appear to have arisen de novo, based on similarity to non-genic regions of Arabidopsis lyrata [9Donoghue M.T. et al.Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.BMC Evol. Biol. 2011; 11: 47Google Scholar]. Estimates of the percentage of genes that are orphans in various species ranges wildly from <1–71% [5Wilson G.A. et al.Orphans as taxonomically restricted and ecologically important genes.Microbiology. 2005; 151: 2499-2501Google Scholar, 9Donoghue M.T. et al.Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.BMC Evol. Biol. 2011; 11: 47Google Scholar, 10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar, 24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar, 25Ekman D. Elofsson A. Identifying and quantifying orphan protein sequences in fungi.J. Mol. Biol. 2010; 396: 396-405Google Scholar, 26Ye C-Y. et al.Evolutionary analyses of non-family genes in plants.Plant J. 2013; 73: 788-797Google Scholar, 27Guo W-J. et al.Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome.Comp. Funct. Genomics. 2007; 2007: 1-7Google Scholar, 28Yang L. et al.Genome-wide identification, characterization, and expression analysis of lineage-specific genes within zebrafish.BMC Genomics. 2013; 14: 65Google Scholar, 29Hahn M.W. et al.Gene family evolution across 12 Drosophila genomes.PLoS Genet. 2007; 3: e197Google Scholar, 30Colbourne J.K. et al.The ecoresponsive genome of Daphnia pulex.Science. 2011; 331: 555-561Google Scholar, 31Ohm R.A. et al.Diverse lifestyles and strategies of plant pathogenesis encoded in the genomes of eighteen Dothideomycetes fungi.PLoS Pathog. 2012; 8: e1003037Google Scholar, 32Gibson A.K. et al.Why so many unknown genes? Partitioning orphans from a representative transcriptome of the lone star tick Amblyomma americanum.BMC Genomics. 2013; 14: 135Google Scholar, 33Kuo C-H. Kissinger J.C. Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria.BMC Evol. Biol. 2008; 8: 108Google Scholar], with 5–15% being fairly typical [10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar, 31Ohm R.A. et al.Diverse lifestyles and strategies of plant pathogenesis encoded in the genomes of eighteen Dothideomycetes fungi.PLoS Pathog. 2012; 8: e1003037Google Scholar, 33Kuo C-H. Kissinger J.C. Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria.BMC Evol. Biol. 2008; 8: 108Google Scholar]. A portion of this disparity is attributable to the varying evolutionary distance between each focal species and its nearest sequenced relatives [10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar]. Other sources of variation are the quality of the genome datasets and the methods used in orphan identification (e.g., three independent studies of A. thaliana report 958, 1324, and 1430 orphans, respectively [9Donoghue M.T. et al.Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.BMC Evol. Biol. 2011; 11: 47Google Scholar, 34Lin H. et al.Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana.BMC Evol. Biol. 2010; 10: 41Google Scholar, 35Guo Y-L. Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes.Plant J. 2013; 73: 941-951Google Scholar]). However, much of the variation reported is likely due to real differences in evolutionary pressures and molecular genetic phenomena that are as yet unknown. Genes can be stratified by age via a technique known as phylostratigraphy that traces modern genes back to their orphan founders [36Domazet-Lošo T. et al.A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages.Trends Genet. 2007; 23: 533-539Google Scholar]. Figure 1 shows a phylostratigraph of the protein-coding genes of A. thaliana. The general approach is to select hierarchical taxonomic groups ascending from the focal species, and for each gene find the oldest taxon in which it has a homolog [36Domazet-Lošo T. et al.A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages.Trends Genet. 2007; 23: 533-539Google Scholar]. By describing the characteristics of increasingly ancient phylostrata, the path from genomic noise to mature protein is revealed. Conventional phylostratigraphic analyses make two major assumptions. The first is that simple search algorithms (such as protein-BLAST) are adequate for the identification of distant homologs. This assumption is supported by evolutionary simulations [37Albà M.M. Castresana J. On homology searches by protein BLAST and the characterization of the age of genes.BMC Evol. Biol. 2007; 7: 53Google Scholar]. However, a recent study of viral orphan genes that used more sensitive algorithms (PSI-BLAST, HHBlits, and HHPred) predicted homologs for about a quarter of genes that had been identified as genus-specific by protein-BLAST [38Kuchibhatla D.B. et al.Powerful sequence similarity search methods and in-depth manual analyses can identify remote homologs in many apparently 'orphan' viral proteins.J. Virol. 2013; 88: 10-20Google Scholar]. A second assumption is that the oldest components of genes are no older than the gene founders. This assumption is violated if an old domain or exon is incorporated into a young protein. This issue has been noted previously, but was considered not to be a serious impediment, at least not in metazoans [7Tautz D. Domazet-Lošo T. The evolutionary origin of orphan genes.Nat. Rev. Genet. 2011; 12: 692-702Google Scholar]. A recent review acknowledges the successes of phylostratigraphy [36Domazet-Lošo T. et al.A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages.Trends Genet. 2007; 23: 533-539Google Scholar, 39Domazet-Lošo T. Tautz D. Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa.BMC Biol. 2010; 8: 66Google Scholar, 40Šestak M.S. et al.Phylostratigraphic profiles reveal a deep evolutionary history of the vertebrate head sensory systems.Front. Zool. 2013; 10: 18Google Scholar] but argues that phylogenetic reconciliation methods offer a more nuanced understanding of the events underlying gene histories [41Capra J.A. et al.How old is my gene?.Trends Genet. 2013; 29: 659-668Google Scholar]. What are the prospects of a young orphan gene? Phylostratigraphic analyses indicate that some orphans survive to fixation. These manifest as gene families that are taxonomically restricted to the clade descending from the species in which they arose. Genes from older phylostrata tend towards greater length, complexity, and connectivity. Figure 2 includes an overview of several cross-phylostrata traits in A. thaliana genes, and compares them to non-genic ORFs. A steady, several-fold increase in protein length from species-specific genes to universally conserved genes has been noted in several metazoans [11Neme R. Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.BMC Genomics. 2013; 14: 117Google Scholar, 42Toll-Riera M. et al.Origin of primate orphan genes: a comparative genomics approach.Mol. Biol. Evol. 2009; 26: 603-612Google Scholar], yeast [24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar], and A. thaliana [35Guo Y-L. Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes.Plant J. 2013; 73: 941-951Google Scholar] (Figure 2A). This is largely due to an increase in the number of exons because the average exon length is somewhat constant, as seen in metazoa [11Neme R. Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.BMC Genomics. 2013; 14: 117Google Scholar] and in the older A. thaliana phylostrata (Figure 2B). Although in some species (such as rice, zebrafish, and humans) genes from recent phylostrata have particularly long exons [11Neme R. Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.BMC Genomics. 2013; 14: 117Google Scholar, 43Campbell M.A. et al.Identification and characterization of lineage-specific genes within the Poaceae.Plant Physiol. 2007; 145: 1311-1322Google Scholar], in A. thaliana there is a significant increase in exon size (about twofold) across the first several phylostrata. By several criteria, younger genes are more random and specialize over time. For example, amino acid composition bias increases with age in yeast [24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar] and many bacteria [44Yomtovian I. et al.Composition bias and the origin of ORFan genes.Bioinformatics. 2010; 26: 996-999Google Scholar]. Similarly, codon bias, which is often used as a proxy for translational optimization [45Sharp P.M. Li W-H. The codon adaptation index – a measure of directional synonymous codon usage bias, and its potential applications.Nucleic Acids Res. 1987; 15: 1281-1295Google Scholar], increases with gene age in primates [42Toll-Riera M. et al.Origin of primate orphan genes: a comparative genomics approach.Mol. Biol. Evol. 2009; 26: 603-612Google Scholar], yeast [24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar], and Drosophila [8Palmieri N. et al.The life cycle of Drosophila orphan genes.eLife. 2014; 3: e01311Google Scholar]. Percent GC content also increases gradually across the phylostrata for a number of species [9Donoghue M.T. et al.Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.BMC Evol. Biol. 2011; 11: 47Google Scholar, 10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar], including A. thaliana [34Lin H. et al.Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana.BMC Evol. Biol. 2010; 10: 41Google Scholar]. Even the youngest genes in A. thaliana, however, are sharply separated in GC content from the pool of non-genic ORFs (43% median GC content for orphans versus 32% for non-genic ORFs) (Figure 2C). Protein isoelectric points across A. thaliana phylostrata tend to decrease from a median of around pI 9.1 for non-genic ORFs and orphans to pI 7.2 for genes in the oldest stratum (Figure 2D). In yeast, protein aggregation propensity decreases with age [46Abrusan G. Integration of new genes into cellular networks, and their structural maturation.Genetics. 2013; 195: 1407-1417Google Scholar] (protein aggregates are often toxic, as exemplified by their role in Alzheimer's disease [47Irvine G.B. et al.Protein aggregation in the brain: the molecular basis for Alzheimer's and Parkinson's diseases.Mol. Med. 2008; 14: 451Google Scholar]). Younger genes also appear to be more evolutionarily radical. Microsatellites and low-complexity regions are more common in younger genes of Drosophila melanogaster [8Palmieri N. et al.The life cycle of Drosophila orphan genes.eLife. 2014; 3: e01311Google Scholar], rice (Oriza sativa) [27Guo W-J. et al.Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome.Comp. Funct. Genomics. 2007; 2007: 1-7Google Scholar], and mammals [48Toll-Riera M. et al.Role of low-complexity sequences in the formation of novel protein coding sequences.Mol. Biol. Evol. 2011; 29: 883-886Google Scholar]. These regions can be powerful drivers of evolution [49Radó-Trilla N. Albà M. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins.BMC Evol. Biol. 2012; 12: 155Google Scholar]. For example, expansion of a dinucleotide repeat in an Antarctic fish provided the material for the evolution of a novel antifreeze protein [50Chen L. et al.Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish.Proc. Natl. Acad. Sci. U.S.A. 1997; 94: 3811-3816Google Scholar]. The robustness of protein secondary structure to mutation also increases with age in yeast [46Abrusan G. Integration of new genes into cellular networks, and their structural maturation.Genetics. 2013; 195: 1407-1417Google Scholar]. High robustness decreases the likelihood of radical structural change, but enhances the subtle evolution of mature proteins by preserving their primary function while they safely explore novel ones [51Ferrada E. Wagner A. Protein robustness promotes evolutionary innovations on large evolutionary time-scales.Proc. R. Soc. B: Biol. Sci. 2008; 275: 1595-1602Google Scholar]. Conversely, low robustness favors more radical changes but heightens the risk of evolutionary failure. The prospects of a young gene are dependent on how securely it can integrate itself into vital processes or networks. Studies in yeast suggest transcriptional regulation appears very quickly, but protein–protein and genetic (or epistatic) interactions develop more gradually [46Abrusan G. Integration of new genes into cellular networks, and their structural maturation.Genetics. 2013; 195: 1407-1417Google Scholar]. Orphan genes tend to be less expressed [8Palmieri N. et al.The life cycle of Drosophila orphan genes.eLife. 2014; 3: e01311Google Scholar, 24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar], and in a narrower range of tissues [9Donoghue M.T. et al.Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.BMC Evol. Biol. 2011; 11: 47Google Scholar, 42Toll-Riera M. et al.Origin of primate orphan genes: a comparative genomics approach.Mol. Biol. Evol. 2009; 26: 603-612Google Scholar]. A comparison of the protein–protein interactions of de novo orphan genes to those of recently duplicated genes found that young de novo genes were very poorly connected relative to young duplicated genes [52Capra J.A. et al.Novel genes exhibit distinct patterns of function acquisition and network integration.Genome Biol. 2010; 11: R127Google Scholar]. However, genes that are predicted to have originated de novo more than 100 million years ago are as well connected as their duplicated peers [52Capra J.A. et al.Novel genes exhibit distinct patterns of function acquisition and network integration.Genome Biol. 2010; 11: R127Google Scholar]. There are several non-exclusive mechanisms by which orphans may gain regulatory elements. Transposon insertion upstream of the transcription start-site can both increase expression overall and couple expression to specific stresses [53Naito K. et al.Unexpected consequences of a sudden and massive transposon amplification on rice gene expression.Nature. 2009; 461: 1130-1134Google Scholar] or tissues (reviewed in [54Lisch D. How important are transposons for plant evolution?.Nat. Rev. Genet. 2012; 14: 49-61Google Scholar]); this mechanism could create a particularly dramatic effect on expression of de novo orphans, which are usually less regulated to begin with. Some orphans may share regulatory elements with older genes via gene overlap [21Murphy D.N. McLysaght A. De novo origin of protein-coding genes in murine rodents.PLoS ONE. 2012; 7: e48650Google Scholar], association with a bidirectional promoter [11Neme R. Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.BMC Genomics. 2013; 14: 117Google Scholar], or by being located within an intron [21Murphy D.N. McLysaght A. De novo origin of protein-coding genes in murine rodents.PLoS ONE. 2012; 7: e48650Google Scholar]. Orphans that are derived from coding gene duplication followed by rapid mutation [55Teichmann S.A. Babu M.M. Gene regulatory network growth by duplication.Nat. Genet. 2004; 36: 492-496Google Scholar], or from a pseudogene (e.g., by an early frameshift mutation), may inherit some regulatory elements from their prior context. Alternatively, cryptic regulatory sites may predate the origin of the gene or arise de novo later [56Tsai Z.T-Y. et al.Evolution of cis-regulatory elements in yeast de novo and duplicated new genes.BMC Genomics. 2012; 13: 717Google Scholar]. Finally, regulation may arise via epigenetics, as is illustrated for the QQS (QUA-QUINE STARCH, AT3G30720) orphan of A. thaliana [57Silveira A.B. et al.Extensive natural epigenetic variation at a de novo originated gene.PLoS Genet. 2013; 9: e1003437Google Scholar]. It might be expected that recently evolved genes would not be crucial for survival; after all, the organism seemed to do quite well without them. However, although the function of the vast majority of individual orphans is unknown, and while they generally lack identifiable folds (Figure 3), functional motifs [27Guo W-J. et al.Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome.Comp. Funct. Genomics. 2007; 2007: 1-7Google Scholar] and recognizable domains [11Neme R. Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution.BMC Genomics. 2013; 14: 117Google Scholar], there is ample evidence of widespread functionality. A study of six de novo genes in Drosophila found four to be essential (i.e., homozygote nulls are embryo-lethal) [58Reinhardt J.A. et al.De novo ORFs in Drosophila are Important to organismal fitness and evolved rapidly from previously non-coding sequences.PLoS Genet. 2013; 9: e1003860Google Scholar]. Another Drosophila study that included 16 randomly selected de novo orphans found that three of these orphans were essential [59Chen S. et al.New genes in Drosophila quickly become essential.Science. 2010; 330: 1682-1685Google Scholar]. However, in mice and yeast genes from older phylostrata are vastly more likely to be essential then their younger counterparts [60Chen W-H. et al.Younger genes are less likely to be essential than older genes, and duplicates are less likely to be essential than singletons of the same age.Mol. Biol. Evol. 2012; 29: 1703-1706Google Scholar]. Although particular plant orphans have been shown to be important for survival under specific conditions [61Luhua S. et al.Linking genes of unknown function with abiotic stress responses by high-throughput phenotype screening.Physiol. Plant. 2013; 148: 322-333Google Scholar], none have yet been reported to be embryo-lethal if eliminated. Further evidence for orphan functionality can be gleaned from the widespread reports that purifying selection is high in old genes and positive selection is high in younger ones [10Wissler L. et al.Mechanisms and dynamics of orphan gene emergence in insect genomes.Genome Biol. Evol. 2013; 5: 439-455Google Scholar, 24Anne-Ruxandra Carvunis et al.Proto-genes and de novo gene birth.Nature. 2012; 487: 370-374Google Scholar, 35Guo Y-L. Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes.Plant J. 2013; 73: 941-951Google Scholar, 62Voolstra C.R. et al.Rapid evolution of coral proteins responsible for interaction with the environment.PLoS ONE. 2011; 6: e20392Google Scholar]. Drosophila orphans were found to be under purifying selection (though much weaker than in older genes) [8Palmieri N. et al.The life cycle of Drosophila orphan genes.eLife. 2014; 3: e01311Google Scholar]. Both positive and purifying selection generally implies functionality (as opposed to neutral drift). The QQS gene of A. thaliana, to our knowledge the first plant orphan with a biochemically characterized function, acts as a regulator of carbon and nitrogen allocation, impacting carbon and nitrogen partitioning to starch, lipid, and protein in leaves and seeds [63Li L. et al.Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves.Plant J. 2009; 58: 485-498Google Scholar, 64Li, L. and Wurtele, E.S., Iowa State University. Materials and method for modifying a biochemical component in a plant, US 20120222167 A1Google Scholar, 65Li L. Wurtele E.S. The QQS orphan gene of Arabidopsis modulates carbon allocation in soybean.Plant Biotechnol. J. 2014; https://doi.org/10.1111/pbi.12238Google Scholar]. High expression of QQS in A. thaliana QQS-overexpression lines results in greater content of protein and lower content of starch; conversely, QQS RNA interference (RNAi) lines with repression of QQS expression have decreased protein content and higher accumulation of starch. If the function of an orphan is dependent on interactions with conserved cellular components, as suggested in [63Li L. et al.Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves.Plant J. 2009; 58: 485-498Google Scholar], then it may be capable of functioning in distant relatives. This hypothesis was tested by introducing QQS into soybean – a species with no sequence homolog t
Referência(s)