Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli
2003; Springer Nature; Volume: 22; Issue: 21 Linguagem: Inglês
10.1093/emboj/cdg561
ISSN1460-2075
Autores Tópico(s)Genomics and Phylogenetic Studies
ResumoArticle3 November 2003free access Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli Olga L. Gurvich Olga L. Gurvich Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Pavel V. Baranov Pavel V. Baranov Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Jiadong Zhou Jiadong Zhou Present address: Gene Technology Division, Nitto Denko Technical Corporation, 401 Jones Road, Oceanside, CA, 92054 USA Search for more papers by this author Andrew W. Hammer Andrew W. Hammer Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Raymond F. Gesteland Raymond F. Gesteland Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author John F. Atkins Corresponding Author John F. Atkins Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Olga L. Gurvich Olga L. Gurvich Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Pavel V. Baranov Pavel V. Baranov Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Jiadong Zhou Jiadong Zhou Present address: Gene Technology Division, Nitto Denko Technical Corporation, 401 Jones Road, Oceanside, CA, 92054 USA Search for more papers by this author Andrew W. Hammer Andrew W. Hammer Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Raymond F. Gesteland Raymond F. Gesteland Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author John F. Atkins Corresponding Author John F. Atkins Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA Search for more papers by this author Author Information Olga L. Gurvich1, Pavel V. Baranov1, Jiadong Zhou2, Andrew W. Hammer1, Raymond F. Gesteland1 and John F. Atkins 1 1Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT, 84112-5330 USA 2Present address: Gene Technology Division, Nitto Denko Technical Corporation, 401 Jones Road, Oceanside, CA, 92054 USA *Corresponding author. E-mail: [email protected] The EMBO Journal (2003)22:5941-5950https://doi.org/10.1093/emboj/cdg561 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info It is generally believed that significant ribosomal frameshifting during translation does not occur without a functional purpose. The distribution of two frameshift-prone sequences, A_AAA_AAG and CCC_TGA, in coding regions of Escherichia coli has been analyzed. Although a moderate level of selection against the first sequence is evident, 68 genes contain A_AAA_AAG and 19 contain CCC_TGA. The majority of those tested in their genomic context showed >1% frameshifting. Comparative sequence analysis was employed to assess a potential biological role for frameshifting in decoding these genes. Two new candidates, in pheL and ydaY, for utilized frameshifting have been identified in addition to those previously known in dnaX and nine insertion sequence elements. For the majority of the shift-prone sequences no functional role can be attributed to them, and the frameshifting is likely erroneous. However, none of frameshift sequences is in the 306 most highly expressed genes. The unexpected conclusion is that moderate frameshifting during expression of at least some other genes is not sufficiently harmful for cells to trigger strong negative evolutionary pressure. Introduction During readout of genetic information into proteins, translation is the last and probably the least accurate process, although considerable accuracy of protein synthesis is crucial for cell survival. Errors in translation are divided into two types: missense errors and processivity errors. Missense errors occur when ribosomes accept a non-cognate AA-tRNA or an aminoacyl tRNA synthetase mischarges a tRNA with a wrong amino acid. Missense errors are the most benign of possible errors, since the mistake is limited to a particular amino acid and does not necessarily inactivate the protein product. Processivity errors include frameshift errors, false recognition of a sense codon by a release factor and drop-off (also termed ribosomal editing—dissociation of a nascent polypeptidyl-tRNA from an mRNA-programmed ribosome). Mistakes in processivity often result in truncated products, and in the case of frameshifting, the sequence of amino acids incorporated after the shift is gibberish. Therefore, unless these errors occur near the end of an open reading frame (ORF), the product is likely to be inactive. As a result, it is believed that selection has resulted in processivity errors being significantly less frequent than missense errors (Kurland et al., 1996). Earlier studies support this idea. The frequency of missense errors was estimated to be between 10−3 and 10−4 (Donner and Kurland, 1972; Loftfield and Vanderjagt, 1972; Edelmann and Gallant, 1977; Parker et al., 1983; Kurland and Gallant, 1986), while processivity errors were estimated to be in the range 10−4–10−7 (Kurland, 1979; Jørgenesen et al., 1993). At the same time it has been noted that processivity errors occur in a sequence-dependent manner and are likely to be more efficient in particular places than in others (Atkins et al., 1972; Manley, 1978; Atkins et al., 1983). Later, a substantial number of relatively simple sequence motifs that can cause significantly high levels of frameshifting in E.coli were characterized (Weiss et al., 1990; Curran, 1993). The discovery of genes whose expression requires non-standard processivity, such as programmed frameshifting, was one of the threads that lead to the term ‘recoding’ (Gesteland et al., 1992), which describes the phenomenon where non-standard translational events are used for gene expression purposes. In the majority of recoding cases in which frameshifting is involved, the efficiency of frameshifting on a specific mRNA site is much higher than the above estimates for error frameshifting due to the presence of stimulatory sequences in the mRNA. Frameshifting programmed in this manner is sometimes even more efficient than standard translation at the same site. The general assumption is that unless they have a functional role, sequences prone to high levels of frameshifting are subject to negative selection. In this scenario frameshift events fall into a low-efficiency frameshift error category or a highly efficient programmed frameshifting class (although it is possible that low-level frameshifting is utilized for gene expression purposes). Despite the studies cited above, it is hard to know what the efficiency of frameshifting errors can be before selection is triggered against them. Here we have analyzed the frequency of occurrence in E.coli genes of two sequences prone to either +1 (CCC_UGA) or −1 (A_AAA_AAG) frameshifting (the triplets indicate the zero frame codons) and measured frameshift efficiency on these sequences in their native contexts. Results A_AAA_AAG The sequence A_AAA_AAG supports efficient −1 ribosomal frameshifting in E.coli. This sequence alone causes ∼2% frameshifting (Weiss et al., 1989) and the efficiency can be greatly increased by the presence of stimulatory signals. Frameshifting at A_AAA_AAG is used for expression of E.coli dnaX, which encodes two subunits of DNA polymerase III: τ and γ. While τ is synthesized by standard translation, synthesis of γ is dependent on a −1 frameshift event on the sequence A_AAA_AAG (Blinkowa and Walker, 1990; Flower and McHenry, 1990; Tsuchihashi and Kornberg, 1990). This frameshifting is 50% efficient, so that τ and γ subunits are synthesized in equal amounts. There are two stimulatory elements in E.coli dnaX mRNA, and both are conserved in related species: an internal Shine–Dalgarno sequence 10 bases upstream (Larsen et al., 1994) and a stem–loop downstream of the frameshift site (Larsen et al., 1997). These stimulators are essential to elevate frameshifting to such a high level. The same sequence (A_AAA_AAG) is also used for programmed frameshifting in bacterial insertion elements in E.coli (Chandler and Fayet, 1993; Hu et al., 1996) and related species (Polard et al., 1991; Rettberg et al., 1999). However, it is likely that this frameshifting is limited to only those bacteria that lack a tRNALys with the anticodon 3′-UUC-5′ (Tsuchihashi and Brown, 1992; Baranov et al., 2002). It is reasonable to expect that the sequence A_AAA_AAG is avoided in E.coli genes that do not utilize frameshifting for their expression. In E.coli K12, there are 70 instances of A_AAA_AAG in 68 genes (two genes have this sequence twice). These genes are listed in Supplementary Table I available at The EMBO Journal Online. Out of the 68 genes, 12 were selected to check whether frameshifting does in fact take place during translation of their mRNAs (Table I). We cloned gene sequences including ∼10 codons upstream and downstream of the shift site into the pGHM57 vector between the glutathione S-transferase (GST) and maltose-binding protein (MBP) genes (see Materials and methods). In those cases where nearby potential 3′ secondary structures were identified their sequences was fully included. Selection of the 12 chosen genes was somewhat biased as they had the first stop codon in the −1 frame at least 10 codons downstream of the shift site, so that the stop codon is not included in the cloned sequence. The sequence in the ‘zero’ frame was placed in-frame with GST and the sequence in the −1 frame was placed in-frame with MBP. Ribosomes that translate through A_AAA_AAG in a standard manner will terminate either at a stop codon in the cloned insert or just after the insert. The resulting products have approximately the same mass as GST (∼28 kDa). Shifting into the −1 frame on A_AAA_AAG yields products of roughly the same mass as the GST–MBP fusion (∼72 kDa) (Figure 1A). Frameshifting was assayed by pulse–chase experiments with [35S]Met as a label and the products were separated by SDS–PAGE (Figure 1B). All tested sequences support −1 frameshifting at levels ranging from 1.2 to 25.5% (Figure 1C). Distant sequences are unlikely to affect frameshifting when transcription and translation are tightly coupled. However, one further A_AAA_AAG sequence was tested, but in the context of the entire gene sequence. Gene ycdB was used (as both its frameshift and termination products are readily distinguishable), and showed a frameshifting efficiency of 8% (Figure 2 and Table I). Figure 1.Measurement of frameshifting efficiency on the A_AAA_AAG sequences. (A) Schematic representation of constructs used to assay frameshifting. (B) Pulse–chase analysis of the products expressed from cassettes with the A_AAA_AAG contexts from different genes. The areas from the gels corresponding to the termination and frameshifting products are shown. The GST lane shows the corresponding products from the parental vector in which the stop codon is located after GST; the GST–MBP lane shows products from the parental vector in which the GST and MBP genes are in-frame. The (−) lane contains labeled proteins from the uninduced control (Materials and methods). (C) Quantitation of the efficiency of frameshifting. Average frameshifting in three independent pulse–chase experiments was calculated for each construct and is represented by black bars. Error bars show standard deviations. Download figure Download PowerPoint Figure 2.Pulse–chase analysis of the products expressed from the construct with the entire sequence of the ycdB gene. The (−) lane contains labeled proteins from the uninduced control. Download figure Download PowerPoint Table 1. Analyzed genes containing the A_AAA_AAG sequence Gene name (Accession no.) Position of A_AAA_AAG nucleotides (no. of codons after the shift in 0/−1 frame) Sequence around shift site Frameshift level (%) atoS (16130156) 1647–1653 (107/24) CTC TCG CTG CAA AAA AAG ATC TTC GAT 7.0 ± 4.6 b3021 (16130917) 234–240 (51/41) GTG AAG GTT CGA AAA AAG CTC TCT CTT 2.3 ± 0.8 selB (16131461) 93–99 (581/28) CTG CCG GAA GAA AAA AAG CGC GGC ATG 2.9 ± 0.3 tdcR (16131012) 57–63 (93/11) GTG GTT AAT ACA AAA AAG GGG CTG AGA 4.0 ± 1.5 ybaQ (33347458) 381–387 (2/11) GAA GAG CGT GCA AAA AAG GTC GCG TAA 1.2 ± 0.5 ybhD (33347481) 177–183 (277/10) ACG CGA AGA ATA AAA AAG ATG GAG GAA 6.6 ± 2.3 ycdB (16128983) 462–468 (267/19) CCA CAG ATG CCA AAA AAG CTG CAG AAG 8.3 ± 3.2 ycdV (1787269) 198–204 (69/9) CAA TAC ACG AAA AAA AAG CCC GTA CTT 25.5 ± 8.8 ydaY (16129327) 348–354 (1/102) CAG GAT ACG ATA AAA AAG CCA TAG CTG 10.3 ± 1.6 yeeO (16129928) 1620–1626 (5/42) CAA AAG TGT GAA AAA AAG CCA GTT GTG 3.9 ± 1.0 yi21_1 (1786557) 363–369 (13/114) TAT GGA CGG GCA AAA AAG TGG ATA GCG 3.4 ± 1.9 yjbB (16131846) 126–132 (499/17) CGG AGC GTC GAA AAA AAG CCG CTC GCC 7.0 ± 1.2 Since A_AAA_AAG alone supports efficient frameshifting without any stimulatory signals, it may be under-represented. To assess possible bias in its representation, codon usage for AAA (3.36%), AAG (1.03%) and occurrence of A in the wobble position (17.79%) were taken into account (see Materials and methods). Then on an unselected basis, in 1 365 282 codons of annotated E.coli K12 ORFs, this sequence should occur 1 365 282 × 0.0103 × 0.0336 × 0.17 ≈ 84 times (though this estimate does not take into account that A_AAA_AAG cannot occur in the first and in the last position of the gene). Therefore, the sequence A_AAA_AAG is somewhat under-represented (∼83% of the expected value). However, this estimate does not take into account how frequently two adjacent lysine residues occur in E.coli proteins. It is possible that the frequencies of tandem lysines are biased thereby influencing the occurrence of A_AAA_AAG. A control for this is the ‘non-shifty’ sequence A_AAG_AAA, in which two lysine codons are retained but their positions swapped. This sequence occurs 132 times, almost twice as frequently as A_AAA_AAG. Thus, both estimates show that the sequence A_AAA_AAG is moderately under-represented in the coding regions of the E.coli K12 genome. In a more rigorous test, 1000 random genomes were generated using the following rules: protein sequences from the original E.coli K12 genome were preserved, but the codons encoding the amino acids were randomized taking into account codon usage. Such random genomes are relieved of selective pressure to avoid slippery sequences. The distribution of A_AAA_AAG occurrences in the genomes generated is shown in Figure 3A. The mean occurrence of A_AAA_AAG is 97.6 per genome. The standard deviation is 9.3 and the standard error of mean is 0.3. None of the 1000 genomes had 70 A_AAA_AAG (the number of A_AAA_AAG in the real E.coli genome) and only one had 72, the lowest count of A_AAA_AAG in the 1000 genomes. One sample t-test was carried out using 70 as a hypothetical mean. The t-value was 98 and the p-value is <0.0001. This p-value suggests that the difference between the mean occurrence count from the randomized genomes and the occurrence in the actual genome is highly statistically significant. Therefore, A_AAA_AAG is indeed under-represented; however, it is not avoided since 68 genes constitute 1.7% of all genes. Figure 3.Distribution of occurrences of slippery sequences in 1000 randomized genomes. (A) A_AAA_AAG. (B) CCC_TGA. Download figure Download PowerPoint Interestingly, the value of mean occurrence of A_AAA_AAG (97.6) is greater than the value predicted from codon usage (84). The probable explanation for this discrepancy lies in the fact that tandem lysines appear in the E.coli genome more often (3044 times) than if their distribution was random (2643). Our results also demonstrate that A_AAG_AAA is over-represented (see Figure 3A). Perhaps part of the reason for over-representation of A_AAG_AAA is compensation for under-representation of the slippery A_AAA_AAG sequence. Another factor that can influence occurrence of A_AAA_AAG is the dinucleotide frequency of As in the third position of the upstream codon and the first position of the downstream codon (XXA_AYY). We have estimated this bias using a random genomes approach and found that such dinucleotides are slightly over-represented in the real genome (∼44 000 in the real genome versus ∼41 000 average in random). Although consideration of this bias may improve the accuracy of our analysis, it is unlikely to affect the general conclusion that there is a moderate selection against A_AAA_AAG sequences. CCC_UGA +1 frameshifting on the sequence CCC_UGA has been reported in several artificial constructs expressed in E.coli (de Smit et al., 1994; Vilbois et al., 1994; O'Connor, 2002). The observed efficiency of frameshifting ranged between 2 and 4%. Frameshifting on this sequence is utilized for expression of antizyme of some eukaryotes (Ivanov et al., 2000) and of tsh gene of Listeria monocytogenes phage PSA (Zimmer et al., 2003). However, it has not so far been found to be used for gene expression in E.coli. To identify coding sequences ending with CCC_TGA, we searched through the annotated E.coli K12 ORFs (Blattner et al., 1997) using the Colibri database (Medigue et al., 1993) and found 18 genes. GenBank has 20 in the nucleotide sequence file, but one of the genes was recently excluded from the annotation. Therefore we consider that there are 19 genes that end with CCC_TGA in E.coli. These genes and the nucleotide sequences surrounding their corresponding termination sites are listed in Table II. We examined the level of frameshifting on the 18 CCC_UGA sites (originally identified using the Colibri database) in their natural context. Sequences from each of the 18 genes including ∼10 codons upstream of the CCC_UGA and 10 codons downstream (or as far as, but not including, the stop codon in +1 frame) were cloned into the pGHM57 vector between the GST and MBP genes. The sequence upstream of the shift site was placed in-frame with GST and the sequence in the +1 frame downstream of shift site was placed in-frame with the MBP gene. Termination at CCC_UGA results in a protein similar in mass to GST, while the product of +1 frameshifting is approximately the same mass as the GST–MBP fusion protein (Figure 4A). The efficiency of frameshifting was assayed as before (Figure 4B). Of the 18 gene sequences, nine support +1 frameshifting at levels higher than 1% (Figure 4C and Table II). Frameshifting at levels lower than 1% is difficult to distinguish from the background in pulse–chase experiments. In two cases frameshifting is very efficient: 15% for pheL and 9.7% for yrhB. Figure 4.Measurement of the frameshifting efficiency in cassettes with sequences from genes ending with CCC_UGA. (A) Schematic representation of expression constructs for analysis of frameshifting efficiency. (B) Pulse–chase analysis of the products expressed from vectors containing inserts of different genes ending with CCC_UGA. The areas from the gels corresponding to the termination and frameshifting products are shown. FS indicates frameshift product; TER indicates termination product. The GST lane shows the corresponding products from the parental vector in which the stop codon is located after GST; the GST–MBP lane shows products from the parental vector in which the GST and MBP genes are in-frame. The (−) lane contains labeled proteins from uninduced control. (C) Quantitation of the frameshifting efficiency. Average frameshifting efficiency of three independent pulse–chase experiments was calculated for each construct and is represented by black bars. Sequences in which frameshifting is <1% are omitted. Error bars show standard deviations. Download figure Download PowerPoint Table 2. All known E.coli K12 coding sequences terminating with CCC_UGA Gene name (Accession no.) Sequence around termination/frameshift site Frameshift level (%) Number of sense codons after a frameshift asnC (16131611) ACC ATC AAG CCC TGAT CGG CTT TTT <1 3 focB (16130417) CGT CAG GAA CCC TGAA AAA TCA GCC <1 10 gatD (16130029) TTG CTC ATT CCC TGAA ACC GCG GGC 1.8 ± 1.2 25 pdxH (16129596) CGT CTT GCA CCC TGAA AAG ATG CAA <1 12 pheL (16130519) TTT ACC TTC CCC TGAA TGG GAG GCG 15 ± 4.7 50 yadC (16128128) GTA ACC TAT CCC TGAT AAC GTA GCA <1 21 ybhH (16128737) GTT TAT CTT CCC TGAA AAA ATT CGT <1 10 ybhO (16128757) GGG GTA AAA CCC TGAT GAG TAA ATC <1 1 ycbF (33347497) CAA AAT CTG CCC TGAA ACA GGT TCG 2.6 ± 0.3 43 ycjD (16129250) TCA CCC TCT CCC TGAA AGA GCG AGG 2.5 ± 1.0 71 ydhW (16129628) TTT CAG AAC CCC TGAA ATT TCA GGG <1 7 yeaB (16129767) GGT GTG AAA CCC TGAC TAT ACT TAT 2.8 ± 0.5 32 yfcN (16130266) CCG GAG TTG CCC TGAG GAG TTG AGC 2.2 ± 1.1 21 ygdB (33347702) TGT CAG CTT CCC TGAA GAA TCA ACA <1 5 yjeF (16131989) AAT TCC GCT CCC TGAT GAG CAG GCA 4.3 ± 0.6 144 ykgD (16128290) CAG CTT GCA CCC TGAA TAA AAC CGC 3.0 ± 0.6 0 yrdB (16131161) GTC TGG TTA CCC TGAT CCA GAT ATT <1 29 yrhB (16131318) TTC GGC TTG CCC TGAC AAA ATA GCC 9.7 ± 4.5 17 yzgL (33347755) GCG GTA ATT CCC TGAA TTA AAA AGT Not assayed 8 To verify that the +1 frameshifting indeed occurs at CCC_UGA, we used affinity tag purification, via GST and MBP, of the fusion protein translated from the construct with yjeF sequence (frameshifting efficiency 4.3%). The mass of the purified protein, as determined by mass spectrometry, is 73 628.15 Da, which is within 2 Da of the predicted mass of the fusion protein, 73 629.9 Da, that would result from shifting from CCC to CCU at the sequence CCC_UGA (Figure 5). Figure 5.Mass spectrum of the GST–MBP fusion protein synthesized from a cassette containing the yjeF sequence. The major peak at 73 628.15 Da corresponds to the predicted mass of the fusion protein (73 629.91 Da). The satellite peak at 73 703.07 Da corresponds to the β-mercaptoethenol adduct of the fusion protein. Download figure Download PowerPoint Does identity of either the stop or proline codons influence the efficiency of frameshifting? Changing the CCC codon to either CCA or CCG in the construct containing pheL decreases frameshifting from 15 to ∼2%, while changing it to CCU decreases frameshifting to 6%. Changing UGA to either UAG or UAA decreases frameshifting to 4 and 10%, respectively (Figure 6). Figure 6.Analysis of the frameshifting efficiency on different CCN-Stop combinations. (A) Pulse–chase experiments with expression vectors containing mutations in pheL. CCC denotes the wild-type pheL context. Other abbreviations indicate the mutation of either a Pro or a Stop codon. The areas from the gels corresponding to the termination and frameshifting products are shown. The GST lane shows the corresponding products from the parental vector in which the stop codon is located after GST; the G–M lane shows products from the parental vector in which the GST and MBP genes are in-frame. The (−) lane contains labeled proteins from uninduced cultures. (B) Quantitation of the pulse–chase results with mutated pheL constructs. Average frameshifting in three independent pulse–chase experiments was calculated for each construct and is represented by black bars. Error bars show standard deviations. Download figure Download PowerPoint Since the sequence CCC_UGA is prone to relatively high efficiency frameshifting, its occurrence is expected to be under-represented in the E.coli K12 genome. The theoretical frequency of CCC_UGA can be calculated by multiplication of the absolute values for CCC and UGA codon usage (7506 and 1252, respectively) divided by the total number of codons in E.coli K12 (1 365 282). This gives a value 6.9, which is significantly less than the observed number of 19. The random genome approach was also applied to analyze the distribution of CCC_TGA (Figure 3B). On average only 4.5 genes end with CCC_TGA in 1000 random genomes. The lower value of the mean (4.5) than the one predicted based on codon usage (6.9) probably reflects the fact that proline codons are under-represented in the last position of ORFs. At the same time, none of the 1000 genomes contains more than 12 genes ending with CCC_TGA. This analysis clearly demonstrates that CCC_TGA is over-represented in E.coli K12 genome, even though it can support efficient frameshifting. This is surprising since a simple change of either the CCC codon to another proline codon or UGA to another stop codon can eliminate significant propensity for frameshifting. Discussion Assessment of the numerous occurrences of the two shift-prone sequences identified requires distinguishing those where there is a selective advantage for specific ribosomal frameshifting from those where it is simply an error, which wastes the cells resources. While some cases of utilized frameshifting may be organism specific, many will be evolutionarily conserved in related species. Comparative analysis with orthologs, juxtaposition of ORFs and features relevant to possible regulatory frameshifting help distinguish between the two categories. The following analysis deals with the two shift-prone sequences separately, and later with common features of the distribution of all members of the erroneous frameshifting category. Similar analysis was also performed on some previously published cases of frameshifting for which no functional role is evident. A_AAA_AAG Statistical analysis of the occurrence of A_AAA_AAG shows that this shift-prone sequence is somewhat under-represented in E.coli. Nevertheless, the total number of such sequences in E.coli, 70, is substantial. Interestingly, similar observations were made for the two other related bacteria, which also lack tRNALys with the anticodon 3′-UUC-5′. In Salmonella typhimurium A_AAA_AAG is slightly under-represented, while in Shigella flexneri 2a it is slightly over-represented, showing that there is also no major avoidance of this slippery sequence in these bacteria. In the 13 E.coli sequences tested (Table I), the frameshifting levels varied from 1 to 25%. With this limited set of sequences, a correlation was not evident between frameshifting efficiency and the presence of 3′ nucleotides with stacking potential (Bertrand et al., 2002). Most probably the cumulative effect of other sequence context surrounding the shift site in these particular cases is more important than the effect of the single 3′ adjacent nucleotide. In several cases the frameshifting is expected not to have significant negative consequences. In the genes ybaQ, yeeO, atoS, b3021 and yqjI, A_AAA_AAG occurs near the end of the ORF. In these cases there is a termination codon in the −1 frame within the next 42 codons. As a result, the product of frameshifting contains almost the same information as the product of standard decoding. In others, however, frameshifting should result in the production of truncated dysfunctional proteins. In selB, ybhD, tdcR, yjbB, ycdV and ycdB, A_AAA_AAG occurs in the early or middle parts of their coding sequences. Ribosomes that frameshift at A_AAA_AAG will encounter a stop codon and terminate. Theoretically such frameshifting could be used for down-regulation of expression, but there is no experimental evidence that frameshifting on A_AAA_AAG can be specifically regulated. Alternatively, the short protein can have a separate function. If the frameshifting is used for gene expression, conservation of the shift site is expected in homologous genes from related species. The frameshift cassette, in genes other than selB and ybhD, is limited amongst sequenced genomes to E.coli species and S.flexneri. For selB and ybhD, it appears that the conservation of tandem lysines is important, rather then the frameshift cassette itself. In the Haemophilus influenzae selB gene, the corresponding sequence is A_AAA_AAA, and in Vibrio cholerae ybhD gene it is A_AAG_AAA. Another gene with A_AAA_AAG is yi21_1, which belongs to the IS2 family of bacterial insertion sequences. −1 frameshifting results in fusion of yi21_1 and yi22_1. Frameshifting in the IS2 element on the sequence A_AAA_AAG was previously reported by Hu et al. (1996). Thus, in this case frameshifting is functional, and it is a true case of recoding. In fact, we found that there are five more IS2 related sequences with A_AAA_AAG sequence in the E.coli K12 genome (see Supplementary Table I). Frameshifting during decoding of the ydaY gene on A_AAA_AAG is also likely to be a recoding event. Standard translation of ydaY terminates one codon after the A_AAA_AAG site. However, −1 frameshifting on A_AAA_AAG yields a fusion of the ydaY product with that of the downstream ORF b1367. Additional evidence that A_AAA_AAG in ydaY is purposeful comes from the fact that there are putative stimulatory signals: an internal Shine–Dalgarno sequence AGAAG, 11 bases upstream of the A_AAA_AAG (with the nearest 3′ start codon 95 nt downstream) and a potential RNA secondary structure appropriately positioned downstream of it. Unfortunately, the functions of both ydaY and b1367 are currently unknown. None of the completed bacterial genomes contain homologs. However, homologous sequences occur in Klebsiella pneumoniae whose sequencing is c
Referência(s)