Impact of C‐terminal amino acid composition on protein expression in bacteria
2020; Springer Nature; Volume: 16; Issue: 5 Linguagem: Inglês
10.15252/msb.20199208
ISSN1744-4292
AutoresMarc Weber, Raul Burgos, Eva Yus, Jae‐Seong Yang, María Lluch‐Senar, Luís Serrano,
Tópico(s)Genomics and Phylogenetic Studies
ResumoArticle25 May 2020Open Access Transparent process Impact of C-terminal amino acid composition on protein expression in bacteria Marc Weber Marc Weber orcid.org/0000-0001-7920-5655 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Raul Burgos Raul Burgos Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Eva Yus Eva Yus Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Jae-Seong Yang Jae-Seong Yang Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Maria Lluch-Senar Maria Lluch-Senar orcid.org/0000-0001-7568-4353 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Luis Serrano Corresponding Author Luis Serrano [email protected] orcid.org/0000-0002-5276-1392 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain ICREA, Barcelona, Spain Search for more papers by this author Marc Weber Marc Weber orcid.org/0000-0001-7920-5655 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Raul Burgos Raul Burgos Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Eva Yus Eva Yus Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Jae-Seong Yang Jae-Seong Yang Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Maria Lluch-Senar Maria Lluch-Senar orcid.org/0000-0001-7568-4353 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Search for more papers by this author Luis Serrano Corresponding Author Luis Serrano [email protected] orcid.org/0000-0002-5276-1392 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain ICREA, Barcelona, Spain Search for more papers by this author Author Information Marc Weber1, Raul Burgos1, Eva Yus1, Jae-Seong Yang1, Maria Lluch-Senar1 and Luis Serrano *,1,2,3 1Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain 2Universitat Pompeu Fabra (UPF), Barcelona, Spain 3ICREA, Barcelona, Spain *Corresponding author. Tel: +34 933160101; E-mail: [email protected] Molecular Systems Biology (2020)16:e9208https://doi.org/10.15252/msb.20199208 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract The C-terminal sequence of a protein is involved in processes such as efficiency of translation termination and protein degradation. However, the general relationship between features of this C-terminal sequence and levels of protein expression remains unknown. Here, we identified C-terminal amino acid biases that are ubiquitous across the bacterial taxonomy (1,582 genomes). We showed that the frequency is higher for positively charged amino acids (lysine, arginine), while hydrophobic amino acids and threonine are lower. We then studied the impact of C-terminal composition on protein levels in a library of Mycoplasma pneumoniae mutants, covering all possible combinations of the two last codons. We found that charged and polar residues, in particular lysine, led to higher expression, while hydrophobic and aromatic residues led to lower expression, with a difference in protein levels up to fourfold. We further showed that modulation of protein degradation rate could be one of the main mechanisms driving these differences. Our results demonstrate that the identity of the last amino acids has a strong influence on protein expression levels. Synopsis Large-scale genomics analyses combined with high-throughput experimental assays reveal that the C-terminal amino acid composition has a strong influence on protein expression levels in bacteria. C-terminal amino acid biases are ubiquitous across bacterial taxonomy: positively charged residues (lysine, arginine) are enriched at the last position, while hydrophobic amino acids and threonine are depleted. High-throughput expression assays using a reporter gene library showed that protein expression varies up to 4-fold, with C-terminal positively and negatively charged residues increasing expression, and hydrophobic residues decreasing expression. Modulation of protein degradation rate due to the identity of the C-terminal residue could explain ˜ 85% of the variation in protein expression. These results are relevant for the optimization of heterologous protein sequences, where the choice of C-terminal residues could lead to increased expression levels. Introduction Protein sequence is shaped by many evolutionary constraints, acting at different levels of the gene expression process. Identifying which sequence features determine the efficiency and accuracy of protein expression has been the subject of intense research. Sequence variations previously believed to be neutral, such as the choice of synonymous codons, have been shown to be under selection (Hanson & Coller, 2018), revealing new mechanisms interacting with the translation process. Most studies have focused on the region close to the N-terminal, where some of the most important mechanisms of translation initiation occur (Goodman et al, 2013; Reeve et al, 2014; Espah Borujeni & Salis, 2016). Much less is known, however, about the potential evolutionary pressures acting at the C-terminal, apart from basic protein function and structure. Early studies showed a differential preference for specific codons upstream of the stop codon in Escherichia coli (Brown et al, 1990; Arkov et al, 1993; Björnsson et al, 1996; Berezovsky et al, 1999) and Bacillus subtilis (Rocha et al, 1999; Palenchar, 2008) proteins. The properties of the last two amino acids were shown to modulate the efficiency of translation termination at the UGA stop codon context in E. coli (Björnsson et al, 1996), in particular for highly expressed genes. Also, several C-terminal sequence motifs were found to induce stalling of translation termination (Hayes et al, 2002; Woolstenhulme et al, 2013). Besides, degradation signals have been identified at the C-terminal of proteins in a variety of bacterial species (Sauer & Baker, 2011), such as the ssrA tag which targets proteins to the ClpXP protease in E. coli (Gottesman et al, 1998). Thus, changes in the efficiency of translation termination and recognition of the C-terminal region by the protein degradation machinery are two potential mechanisms that could drive preferences in the C-terminal composition of proteins. Translation is one of the most energy-intensive processes in the cell, consuming about 40% of the cellular energetic resources in fast-growing bacteria (Russell & Cook, 1995). Thus, sequence features at the C-terminal that lead to variations in protein abundance, by modulating either translation or degradation rates, are likely to be under selection. However, the impact of C-terminal composition on protein abundance remains largely unknown. Many studies have identified sequence features that influence translation efficiency, but most of them have focused on the 5′ end region or the bulk of the coding sequence (Kudla et al, 2009; Goodman et al, 2013; Cambray et al, 2018). In these studies, the use of synthetic libraries of a reporter gene with randomized sequence has proved as a robust approach to evaluate the functional impact of sequence variation. A similar approach applied to variations in the C-terminal region would provide useful information to identify sequence features associated with higher or lower protein abundance. Here, we investigated the impact of C-terminal sequence composition on protein expression. Firstly, we leveraged the considerable increase in the past decade in the number of available bacterial genome sequences (Loman & Pallen, 2015) to study C-terminal composition biases of 1,582 genomes across the bacterial taxonomy. Such a large-scale comparative analysis allowed us to reveal the universality of the sequence compositional patterns, as well as preferences which seem to be species-specific, and unveil their association with protein function and protein abundance. Secondly, we experimentally assessed, in a high-throughput manner, the influence of C-terminal composition on protein expression levels in the model organism Mycoplasma pneumoniae using the ELM-seq technique (Yus et al, 2017). We built a random library of the dam reporter gene with varying C-terminal sequence, covering all possible combinations of the last two codons and the six nucleotides following the stop codon. By measuring the expression levels of all variants, we showed that the identity of the last two amino acids has a strong impact on protein abundance. We validated these results by varying the last residue of a different protein in the same species. Furthermore, we provide evidence associating the identity of the last C-terminal amino acid with protein degradation rate. Overall, our results show that in bacteria, the C-terminal residue of protein sequences modulates protein expression levels and is under selective pressure. Results Analyzing C-terminal compositional biases in bacteria C-terminal amino acid and codon composition in bacteria is biased We investigated biases in codon and amino acid composition of the C-terminal region of bacterial protein sequences. We retrieved all protein sequences from the RefSeq database (Haft et al, 2018), using the reference and representative genome collections, in order to achieve a broad coverage of bacterial species across taxonomy. To avoid over-representation of duplicated proteins within the same bacterial species, we removed proteins which presented both a high overall sequence identity and a high identity of their C-terminal region (see Materials and Methods). We obtained a database of approximately 4.8 M protein sequences covering 1,582 genomes and 1,516 species, which we used as a starting point for all the following analyses. When studying all species in the bacterial kingdom, we found that the amino acid composition at the last position upstream of the stop codon differed significantly from the bulk amino acid composition (Fig 1A). In particular, positively charged amino acids were enriched at position −1, with the frequency of lysine and arginine being 2.32 times (two-sided Fisher's exact test P ≈ 0 within numerical error) and 1.76 times (P = 7.7e-30) higher, respectively. In contrast, the occurrence of threonine was 2.25 times (P = 2.2e-308), and that of methionine, 2.02 times (P = 7.1e-51) lower. Due to the large number of sequences considered in this analysis, all biases were significant with extremely small P-values, even after correcting for multiple testing. Interestingly, we observed a gradient in the intensity of the biases for all amino acids that showed a statistical difference at the C-terminal, except for threonine that was specifically depleted at the −1 position. This gradient was more evident in the case of arginine and lysine, whose biases decreased from a maximum value at the C-terminal position toward the bulk, with an odds ratio still significant for lysine of 1.16 (P = 2.3e-23) at position −20. All hydrophobic amino acids, except phenylalanine, were found to be disfavored at the last position, with fold changes in frequency ranging from 0.49 to 0.87 times (P = 7.5e-6). The amino acid frequencies detected at positions −1 and −2 differed from those found in disordered regions in proteins, indicating that the preferences observed are not due to the C-terminal being in general unstructured in proteins (Kleppe & Bornberg-Bauer, 2019) (χ2 test P ≈ 0, Appendix Fig S1). Figure 1. Biases in C-terminal protein sequence composition in the bacterial kingdomAmino acid and codon composition at the C-terminal of bacterial protein sequences shows higher (red) or lower (blue) frequency when compared to their frequency in the bulk of sequences (same color code for panels A, B, D, and E). Significance of the biases was tested using Fisher's exact test and multiple testing correction with 5% false discovery rate within each plot category. Position-specific amino acid composition biases for the last 20 amino acids at C-terminal. Composition bias of the last two amino acids compared to the frequency of the di-amino acid in the bulk. Epistasis between the last two amino acids. Frequency of the pair was compared to the expected frequency if the two positions were independent. In this case, significance was tested with the binomial test and the same multiple testing correction. Color code has a smaller range than in other panels. Amino acid composition bias at the position −1 for individual phyla. Phyla were ordered following an approximate phylogenetic tree. Codon composition bias at the last position. Download figure Download PowerPoint In order to explore possible cooperative effects, we investigated the frequency of amino acid pairs at the last two positions from two points of view. First, we compared the frequency of the C-terminal amino acid pair to the frequency of the same dipeptide in the bulk (Fig 1B), thus correcting for dipeptide biases observed in proteins (Gutman & Hatfield, 1989). Pairs of positively charged amino acids were found enriched, and pairs of hydrophobic amino acids depleted, recapitulating the biases observed for individual amino acids at the last two positions. Second, we compared the frequency of an amino acid pair at the C-terminal to its expected frequency under the assumption that positions −1 and −2 are independent (null model) (Fig 1C). The deviation from the expected frequency, or epistasis, revealed that many of the pairs of repeated residues were more frequent than expected, in particular, CC-Stop (odds ratio 5.16), MM-Stop (odds ratio 2.93), WW-Stop (odds ratio 2.24), HH-Stop (odds ratio 1.84), and KK-Stop (odds ratio 1.80) (binomial test, P ≈ 0 for all). These five amino acid pairs also exhibited a positive epistasis in the bulk (Appendix Fig S2) (odds ratio 1.29–1.53), suggesting that part of the observed positive interaction was not specific to the C-terminal. In addition, because cysteine, methionine, tryptophan, and histidine were also the least frequent amino acids in general, a small number of proteins with a conserved functional motif that include those dipeptides could easily lead to the over-representation of the pair. In particular, the CC pair presented the strongest positive epistasis effect. In the family of metal sensor proteins, binding to metal ions is often mediated by multiple cysteine thiolates (Rosen, 1999; Osman & Cavet, 2010). While the metalloregulatory protein families are diverse in structure and functions, some of them possess a conserved cysteine-rich motif close to the C-terminal (Ma et al, 2009). Indeed, at least 396 out of the 2,034 proteins in our database that possess a CC-Stop motif belonged to orthogroups related to metalloprotein families. On the opposite, the amino acid pairs that exhibited the most negative epistasis were some of the XP-Stop combinations DP-Stop (odds ratio 0.33), GP-Stop (odds ratio 0.43), PP-Stop (odds ratio 0.44), and FP-Stop (odds ratio 0.54) (binomial test, P ≈ 0 for all). Two of these dipeptides (DP and PP) were previously shown to induce the strongest level of translation stalling and tagging by the ssrA ribosome rescue system in E. coli (Hayes et al, 2002), providing a possible explanation for this negative selection. In order to explore whether composition biases are conserved across the bacterial phylogeny, we grouped bacterial genomes into taxonomic clades at the level of phyla, and analyzed composition biases of sequences from each reduced set of bacterial species (Fig 1D). Overall, the main biases at the C-terminal position were present in virtually all phyla. Interestingly, proline was found to be enriched in 12 phyla and depleted in other 16 phyla (e.g., 0.23 odds ratio in Tenericutes, 0.17 in Fusobacteria). The same analysis at a finer level of the taxonomy (Appendix Figs S3 and S4) showed that the biases for proline varied greatly across taxonomic clades. Interestingly, the biases across phyla for threonine anticorrelated with the biases for lysine (Pearson r = −0.84). Pattern of C-terminal codon biases and its relationship to the stop codon context We then asked whether those biases were restricted to specific codons or were rather independent of the identity of the synonymous codon (Fig 1E, and Appendix Figs S5 and S6). We found that both codons encoding for lysine (AAA and AAG) were enriched at the −1 position at fairly similar levels (odds ratios 3.13 and 2.43). Similarly, all four codons coding for threonine were depleted, despite some variations in their odds ratios. Interestingly, only three (CGA, AGA, and AGG) out of the six arginine codons were strongly enriched, with the highest odds ratio for the CGA codon (4.44). In the group of hydrophobic amino acids that were found to be depleted (valine, isoleucine, leucine, methionine, tyrosine, and tryptophan), some synonymous codons were more strongly underrepresented, although in general, the majority of these codons were disfavored (11 out of 17). We emphasize that these preferences were not related to differences in synonymous codon usage, since all biases were computed with respect to the frequency of codons in the bulk. Interestingly enough, with the exception of arginine AGA and AGG codons and lysine codons, in all other amino acids there was a strong preference for an A at the third position. We reasoned that some of these variations in C-terminal codon biases could be related to specific interactions with the stop codon. Thus, we analyzed the biases in C-terminal codon composition in each of the three stop codon contexts, both at the level of bacterial kingdom (Fig 2) and at the level of individual phyla (Appendix Fig S7). Overall, codons that presented strong enrichment/depletion were enriched/depleted in all stop codon contexts. However, some variations in codon biases were also observed. We found that the identity of the last base of the C-terminal codon had an influence on codon biases depending on the stop codon context (Fig 2, Appendix Fig S8). In particular, NNA codons were more often enriched than other codons in all three stop codon contexts (distribution of the log2(odds ratio), independent t-test P = 4e-24). This preference was exacerbated in the UGA context, where NNA codons were clearly favored over NNG codons (P = 1.8e-49). Interestingly, the presence of a C-terminal codon ending with an A in the context of a UGA stop codon creates an overlapping starting AUG codon, while the other bases will result in the less favored UUG, GUG, CUG start codons. In bacteria, genes that overlap on the same polycistronic mRNA are common [Johnson & Chisholm, 2004; 34% in E. coli MG1655 (Keseler et al, 2013; Tian & Salis, 2015)]. Both genomic compression and translational coupling are believed to promote short overlaps of 4 nt, in which the use of the UGA stop codon is particularly frequent (Lillo & Krakauer, 2007). Thus, we reasoned that the preference for NNA codons in the UGA stop codon context could reflect selection constraints of overlapping start codons. In order to test this hypothesis, we analyzed (Fig 2) the codon biases in the UGA stop codon context when excluding genes for which the start codon of the downstream gene overlapped at nucleotide position −1, e.g., NNA-UGA where AUG is the start codon. In this case, the preference for NNA codons was greatly reduced (Appendix Fig S8D, P = 0.12), suggesting that this effect is mainly due to the overlapping of start codons. Figure 2. Codon composition biases at C-terminal and stop codon contextIn the bacterial kingdom, protein sequences were first classified by their stop codon context: UAA, UAG, or UGA. In the case of the UGA stop codon context, genes were excluded (†UGA context) for which the start codon of the downstream gene was overlapping with the stop codon at nucleotide position −1, e.g., NNA-UGA where AUG is the downstream start codon. Codon frequency at C-terminal was then compared to the bulk codon frequency within each stop codon context class. Significance of the biases was tested using Fisher's exact test and multiple testing correction with 5% false discovery rate within each class (all cases were significant). Download figure Download PowerPoint Pattern of C-terminal amino acid biases is qualitatively independent of functional category Next, we asked whether the biases in C-terminal amino acid composition are specific to some protein functional classes. We hypothesized that some of the biases could be driven by a small group of proteins with amino acid composition different from the average. In order to explore this hypothesis, we analyzed protein functional groups in two ways. First, we studied whether the enrichment for lysine and arginine at the C-terminal region could be due specifically to transmembrane proteins. It has been suggested that the orientation of transmembrane domains is determined by the enrichment of positively charged residues in cytoplasmic loops, rather than in periplasmic loops, a mechanism known as the positive-inside rule (Driessen & Nouwen, 2008; Dalbey et al, 2011). Thus, transmembrane proteins whose C-terminal domain is cytoplasmic might present an enrichment of lysine and arginine compared to their frequency in the bulk of the sequence. Indeed, positive charges have been shown to be enriched in the C-terminal cytosolic region of transmembrane proteins in E. coli (Charneski & Hurst, 2014). We classified proteins as membrane or cytoplasmic based on computationally predicted localization for a selection of 364 bacterial species, and computed the C-terminal amino acid composition biases for each of the two classes (Fig EV1). Positively charged residues were found to be strongly enriched in the last 10 positions of the C-terminal of membrane proteins (mean odds ratio K, 2.10, R, 1.69). The same biases were weaker for cytoplasmic proteins (mean odds ratio K, 1.57, R, 1.22). Hydrophobic amino acids were found to be depleted in both protein categories, although more strongly in membrane proteins (mean odds ratio for A, V, I, L, M, F, W, Y, 0.72 for membrane, 0.84 for cytoplasmic). Apart from these differences, we found a similar pattern of biases at position −1 for membrane and cytoplasmic proteins (Fig EV1C), including depletion of threonine, methionine and hydrophobic residues, and enrichment of lysine and arginine. Thus, while membrane proteins have a higher frequency of positively charged residues at the C-terminal, they only partially contribute to the global amino acid composition biases observed at the level of all proteins. Click here to expand this figure. Figure EV1. C-terminal amino acid composition bias for membrane proteins A–D. Proteins were classified as membrane or cytoplasmic proteins based on predicted subcellular localization, for a selection of 364 bacterial species. Position-specific C-terminal amino acid composition bias for membrane (A) and cytoplasmic (B) proteins. Significance of the biases was tested using Fisher's exact test and multiple testing correction with 5% false discovery rate. (C) Bias in amino acid composition at C-terminal (position −1) for membrane, cytoplasmic, and all proteins. (D) Amino acid bulk frequency. Download figure Download PowerPoint Second, we systematically classified proteins into functional categories by assigning each protein sequence to a Cluster of Orthologous Groups (COG) category. We computed the composition biases within each of the 23 functional categories, by comparing the frequency of amino acids at the C-terminal to the bulk frequency of sequences in the same category (Fig EV2). The previously observed general biases were qualitatively maintained in the vast majority of the functional categories. Importantly, the overall pattern of biases was maintained despite differences in the bulk frequency of some amino acids between categories. For example, ribosomal proteins contain many positively charged residues that are essential for their interaction with RNA (Klein et al, 2004), and as a consequence, proteins in the J category “Translation, ribosomal structure and biogenesis” have a higher frequency of lysine in the bulk (6.01% compared to 4.23% in average for the other categories). However, in this case, the enrichment for lysine at the C-terminal still holds when compared to its frequency in the bulk (16.2% compared to 6.01%, odds ratio 3.02). Similarly, in each of the other 22 categories, the frequency of lysine was higher than in the bulk. Click here to expand this figure. Figure EV2. C-terminal amino acid composition bias at position −1 for each COG categoryProteins were classified based on their computationally assigned COG category. Within each group, C-terminal amino acid composition biases were computed at position −1, by comparing amino acid frequency to the frequency in the bulk of all sequences in the group. Significance of the biases was tested using Fisher's exact test and multiple testing correction with 5% false discovery rate. Categories A, B, W, and Z contained the lowest number of proteins (from 396 to 1,490, compared to 242,738 in the J category), which resulted in lower statistical power for the estimation of biases. Download figure Download PowerPoint Therefore, the main pattern of amino acid biases observed at the last position of the C-terminal is qualitatively independent of functional category. C-terminal amino acid identity is associated with protein abundance One possibility is that the observed C-terminal biases could be driven by an underlying mechanism affecting protein abundance, such as translation termination efficiency or protein degradation. If this is the case, we would expect to observe differential biases for proteins that are highly abundant than for lowly abundant proteins. We examined the association between protein abundance and C-terminal biases in 13 bacterial species for which at least 40% of the proteins had experimental abundance value from the PaxDB database (Wang et al, 2015). We categorized proteins into low (percentiles 0–20), medium (percentiles 20–80), and high (percentiles 80–100) abundance, and computed the amino acid composition biases in each group (Fig 3). C-terminal amino acid frequencies in each group were compared to the bulk amino acid usage of the proteins in the same group (null), such that differences in bulk amino acid usage between lowly and highly abundant proteins (Appendix Fig S9) were corrected for. Highly abundant proteins showed a stronger enrichment at the C-terminal (position −1) of K but not for R, and depletion of T, P, and C compared to lowly abundant proteins. In addition, amino acid biases at position −3, with the exception of cysteine, were very similar between the two abundance categories. Thus, the identity of the C-terminal amino acid at position −1 could be in part related to protein abundance. Figure 3. Protein abundance and C-terminal amino acid identityProteins were categorized into low (percentiles 0–20, green), medium (percentiles 20–80, gray), and high (percentiles 80–100, brown) abundance, and C-terminal amino acid composition biases, with respect to the bulk frequency of each category, at position −1 (upper plot) and position −3 (lower plot), were analyzed for each of the three categories. Download figure Download PowerPoint Pattern of amino acid substitution rates suggests C-terminal-specific purifying and positive selections We then investigated whether evolutionary forces specific to the C-terminal position could be identified. We hypothesized that if the identity of the C-terminal amino acid can have an impact on protein expression and potentially on fitness, the pattern of amino acid substitutions at the C-terminal position should be significantly different from the pattern observed at other positions. More precisely, if the ancestral state is a favorable amino acid, purifying selection would decrease the substitution rate to non-favorable amino acids. Contrariwise, if the ancestral state is a non-favorable amino acid, positive selection would increase the substitution rate to favorable amino acids. We analyzed 57 triplets of closely related bacterial genomes (Table EV1) with unambiguous phylogeny from the ATGC database (Kristensen et al, 2017). Using the maximum parsimony principle, we reconstructed the ancestral state and computed amino acid substitution frequencies at each position from the C-terminal (Fig 4A). The total frequency of all substitutions clearly increased from the positions in the bulk toward the C-terminal (Fig 4B), which likely indicated a relaxation of purifying selection, as previously reported
Referência(s)