Why are human G-protein-coupled receptors predominantly intronless?
1999; Elsevier BV; Volume: 15; Issue: 2 Linguagem: Inglês
10.1016/s0168-9525(98)01648-5
ISSN1362-4555
AutoresAndrew J. Gentles, Samuel Karlin,
Tópico(s)Mass Spectrometry Techniques and Applications
ResumoThe proportion of intronless human genes is generally thought to be low—at most 5%. One of the few families of intronless genes is the histones: comparatively small proteins (average 130 amino acids, range ∼100–220 amino acids) that are abundantly expressed and have a high degree of protein sequence conservation. A striking group of larger human genes (average 380 amino acids, range ∼300–880 amino acids), which has many members with intronless protein coding regions, is the G-protein-coupled receptor family (GPCRs)1Lismaa T.P. et al.G Protein-Coupled Receptors. Springer-Verlag, 1995Google Scholar. An extensive survey of human and other mammalian GPCR sequences from GenBank, excluding genes with incompletely specified coding regions or which were more than 90% similar to other GPCRs at the protein level, revealed that in excess of 90% are intronless in their open reading frame (ORF). GPCRs are a diverse family of receptors that mediate ligand-induced signalling between the extra- and intracellular environments via interaction with the intracellular G proteins, resulting in the activation of intracellular effector systems. They share a common structure of seven α-helix transmembrane segments, but typically have low (< 40%) overall sequence similarity to each other. Approximately 80% of hormones and neurotransmitters involved in signal transduction are believed to act through GPCRs, making them the largest mammalian receptor class. A particular GPCR may be expressed thousands of times on the cell surface and many different receptors may respond to the same ligand. GPCR genes are scattered widely throughout the genome. Among other things, a review and classification of GPCRs can be found online at the GPCRDB site2Horn F. et al.GPCRDB: an information system for G protein-coupled receptors.Nucleic Acids Res. 1998; 26: 275-279Crossref PubMed Scopus (342) Google Scholar. The pervasive absence of introns from human GPCRs contrasts sharply with other function classes of genes (Table 1). Among other cell surface proteins, most tyrosine kinase receptors and transport channel proteins contain multiple introns. In a representative set of 242 human genes, comprising all those in GenBank with complete coding regions from sequenced large contigs ⩾ 50 kb in length, 18 (7%) had no introns in the ORF. Of these, six were identified as GPCRs, suggesting that they make up around 30% of such genes in humans.Table 1G-protein-coupled receptor genes tend to be intronlessaThe proportion of intronless genes among G-protein-coupled receptor genes (GPCRs) (human and non-human), compared with human serine proteases, kinases, helix–loop–helix proteins and non-GPCR cell surface receptors (including adhesion, transport and channel proteins and other signal transduction proteins). Only genomic GenBank entries were considered. Many more GPCR entries occur in GenBank (around 600 for humans), but the majority are cDNA or RNA sequences. Where two sequences were more than 90% similar in their amino acid sequence, one was removed at random in order to exclude multiple copies of the same gene3.GenBank entriesProportion with intronless ORF (%)Human non-GPCRsSerine proteases180Kinases408Helix–loop–helix1844Cell surface receptors4835GPCRsViral1889Vertebrates13693All mammals12093Primates8290Rodents2896Danio rerio (zebrafish)12100C. elegans600a The proportion of intronless genes among G-protein-coupled receptor genes (GPCRs) (human and non-human), compared with human serine proteases, kinases, helix–loop–helix proteins and non-GPCR cell surface receptors (including adhesion, transport and channel proteins and other signal transduction proteins). Only genomic GenBank entries were considered. Many more GPCR entries occur in GenBank (around 600 for humans), but the majority are cDNA or RNA sequences. Where two sequences were more than 90% similar in their amino acid sequence, one was removed at random in order to exclude multiple copies of the same gene3Brendel V. PROSET—A fast procedure to create non-redundant sets of protein sequences.Mathl. Comput. Modelling. 1992; 16: 37-43Crossref Scopus (34) Google Scholar. Open table in a new tab By contrast with mammals, GPCR genes in Caenorhabditis elegans are replete with introns (Table 1). Drosophila melanogaster has a number of GPCRs, but the gene structure is known in only a few examples. Of particular interest are cases such as the 5-HT1B serotonin receptor, where the fly homolog has introns and the human one does not. Yeast (Saccharomyces cerevisiae) contains two GPCR genes only, which are intronless, that respond to the α and a pheromone factors. Caution is required in identifying GPCRs in the databases. For example, in Drosophila the receptor proteins smoothened and frizzled are seven-transmembrane proteins with known ligands, but it is a matter of debate whether they actually couple to G proteins. What could account for the predominance of intronless genes among human GPCRs? An evolutionary scenario might propose that most GPCR genes derive from a single intronless common progenitor relatively recently in evolutionary history with insufficient time to allow for gain of introns ('introns late'); as has been suggested in the case of the dopamine receptors4O'Dowd B.F. Structures of dopamine receptors.J. Neurochem. 1993; 60: 804-816Crossref PubMed Scopus (131) Google Scholar. This would imply a rapid diversification during which invertebrate GPCRs acquired introns, but vertebrate ones did not; or that vertebrates have acquired introns at a slower rate. Alternatively, mammalian GPCR genes might originally have had introns that have subsequently been lost. Reverse transcription provides one mechanism for creating intronless duplicate or replacement genes5Nouvel P. The mammalian genome shaping activity of reverse transcriptase.Genetics. 1994; 93: 191-201Google Scholar, 6Flavell A.J. Retroelements, reverse transcriptase and evolution.Comp. Biochem. Physiol. 1995; 110B: 3-15Crossref Scopus (52) Google Scholar. If an endogenous retroposon or pair of retroposons inserts in the genome adjacent to or flanking a multi-exon gene, the expanded sequence might be transcribed and processed. Subsequently, the RNA sequence might be reverse transcribed into the genome producing an intronless allele. Retroviruses can also play a role in transforming multi-exon genes when they integrate into the host genome nearby. Subsequent processing followed by reverse transcription again can produce an intronless form of the gene. There are many examples of oncogenes that have been rendered intronless, the classic example being the src gene, transduced by the Rous sarcoma virus as the v-src viral oncogene7Brickell P.M. The p60c-src family of protein-tyrosine kinases: structure, regulation, and function.Crit. Rev. Oncog. 1992; 3: 401-446PubMed Google Scholar. Movement of proto-oncogenes can result in dramatic up-regulation of activity, for example when MYC is brought under transcriptional control of a strong promoter, such as an immunoglobin gene promoter, leading to Burkitt's lymphoma8Klein G. Multistep evolution of B-cell-derived tumors in humans and rodents.Gene. 1993; 135: 189-196Crossref PubMed Scopus (17) Google Scholar. It is interesting to note that (with the possible exception of the Drosophila gypsy and ZAM elements9Pelisson A. et al.About the origin of retroviruses and the co-evolution of the gypsy retrovirus with the Drosophila flamenco host gene.Genetica. 1997; 100: 29-37Crossref PubMed Google Scholar, 10Leblanc P. et al.Invertebrate retroviruses: ZAM a new candidate in D. melanogaster.EMBO J. 1997; 16: 7521-7531Crossref PubMed Scopus (46) Google Scholar that appear to contain all the characteristic retroviral components including an env-type gene) no invertebrate retroviruses have been found so far, mirroring the lack of intronless GPCRs in invertebrates (although some do possess retrotransposons). Other viruses can also capture and move genes. For example, several herpesviruses have acquired host GPCR genes of the chemokine family. Retrotransposition facilitates change and diversity, and increases rearrangement opportunities for recombination. However, most transposed elements are dead pseudogenes or are neutralized by hypermethylation or excessive mutations (e.g. as occurs by repeat-induced point mutation11Selker E.U. Premeiotic instability of repeated sequences in Neurospora crassa.Annu. Rev. Genet. 1990; 24: 579-613Crossref PubMed Scopus (510) Google Scholar). Kricker et al.12Kricker M.C. et al.Duplication-targeted DNA methylation and mutagenesis in the evolution of eukaryotic chromosomes.Proc. Natl. Acad. Sci. U. S. A. 1992; 89: 1075-1079Crossref PubMed Scopus (130) Google Scholar proposed that methylation plays a role in reducing the rate at which a gene is subjected to recombination. There is a perspective of balance between stability and diversification contributed by retrotransposition and methylation. In the above ways intronless genes and diversity might readily be generated as a consequence of reverse transcription. The mammalian olfactory GPCR genes are considered archetypal outcomes of reverse transcription coupled with gene duplications13Issel-Tarver L. Rine J. The evolution of mammalian olfactory receptor genes.Genetics. 1996; 145: 185-195Google Scholar. Among the GPCRs with intronless open reading frames, about 18% possess introns in their 5′ untranslated region, 33% contain no introns and in the remaining cases the flanking regions were not annotated. No examples were found with introns 3′ to the coding region. Unfortunately, there was insufficient flanking sequence available to make a comprehensive search for poly(A) tails that would strengthen the case for reverse transcription. Is it possible that GPCR regulatory sequences are natural target sites for transposable elements? Insertion of Drosophila P-elements into gene promoters or protein coding regions generally has lethal consequences; however, if they insert 5′ to the promoter correct transcription and translation may proceed. The yeast Ty1 retrotransposon selectively targets regions 5′ to RNA polymerase III transcription initiation sites14Devine S.E. Boeke J.D. Integration of the yeast retrotransposon Ty1 is targeted to regions upstream of genes transcribed by RNA polymerase III.Genes Dev. 1996; 10: 620-633Crossref PubMed Scopus (183) Google Scholar. On the other hand, perhaps GPCRs lack sites at which introns prefer to insert15Mattick J.S. Introns: evolution and function.Curr. Opin. Genet. Dev. 1994; 4: 823-831Crossref PubMed Scopus (213) Google Scholar. In this context it is notable that in those GPCRs that have them, the introns are exclusively located in the 5′ flanking regions, or at the junctions between loop and transmembrane regions. The majority of retroviral genomes are GC-rich and, furthermore, appear to integrate preferentially in GC-rich regions of the host genome16Zoubak S. et al.Regional specificity of HTLV-1 proviral integration in the human genome.Gene. 1994; 143: 155-163Crossref PubMed Scopus (33) Google Scholar. In addition, the total intron content of genes in such regions is several times less than the average, and genes are smaller than in GC-poor isochores17Duret L. et al.Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores.J. Mol. Evol. 1995; 40: 308-317Crossref PubMed Scopus (205) Google Scholar. (This is the opposite of the situation in many bacteria, where larger genes occur in GC-rich regions18Karlin S. et al.Codon usages in different gene classes of the Escherichia coli genome.Mol. Microbiol. 1998; 29: 1341-1355Crossref PubMed Scopus (192) Google Scholar.) Compared with the representative sample of 242 genes from large contigs, the GPCRs are much more GC-rich at the third codon position. The frequency of (G+C) at the third base of a codon is around 10% higher in the average GPCR, compared with the average human gene. This suggests that GPCRs are more likely to be located in GC-rich isochores favored by retroviral integration and where introns tend to be smaller. Similar to retroviruses, human Alu elements, which exhibit high specificity for where they insert, show a strong bias towards GC-rich sequences; however, the less specific L1 elements prefer AT-rich regions19Jurka J. Repeats in genomic DNA: mining and meaning.Curr. Opin. Struct. Biol. 1998; 8: 333-337Crossref PubMed Scopus (248) Google Scholar. In total, Alu and L1 elements seem to comprise around 25% of human DNA, leaving little doubt that transposable elements have had a major effect on the genome. Might there be selective advantages to having intronless GPCRs? Intronless genes, not requiring post-transcriptional splicing, might be transcribed efficiently and with potentially greater abundance and rate of protein expression. The majority of mammalian GPCR genes relate to central nervous system (CNS) activity, which often requires high or maximal expression of many genes. However, the GPCR for rhodopsin, which is widespread phylogenetically and abundantly expressed, contains four introns in highly conserved positions. Furthermore, in a number of cases it has been shown that introns actually increase gene expression levels20Duncker B.P. et al.Introns boost transgene expression in Drosophila melanogaster.Mol. Gen. Genet. 1997; 254: 291-296Crossref PubMed Scopus (63) Google Scholar. This suggests that efficiency per se is not the crucial factor. It would be interesting to see what effect addition or deletion of introns in GPCR genes has on their mRNA abundance. Such experiments might shed light on why the introns occur solely outside the transmembrane coding regions. Genes without introns are not liable to differential and aberrant splicing, which might result in higher transcriptional fidelity. However, alternative versions of a gene transcript might be appropriate for different tissue types or at different life stages. The GPCRs exhibit an intriguing contrast between gene structure in vertebrates and invertebrates. While we have suggested that endogenous and exogenous reverse transcription might have played a significant role, the issue of how and why GPCRs have come to be largely intronless remains a challenging one. Two possible in vitro experiments spring to mind. First, the effect on mRNA abundance of removing introns from a GPCR such as rhodopsin could be evaluated. This would establish directly whether intronless GPCR genes are expressed at a higher level (it is already known that GPCR cDNAs are functional). Second, insertion of introns (at appropriate places with appropriate lengths and splice signals) into intronless GPCR coding regions would help to illuminate whether the presence of introns is genuinely disfavored, or was historically determined either by accident or through recruitment of 'primordial' exons. We thank H. Bourne (UCSF), A.M. Campbell (Stanford), S.R. Coughlin (UCSF), J. Jurka (Genetic Information Research Institute), B.K. Kobilka (Stanford), M.A. Krasnow (Stanford), G. Miklos (Neurosciences Research Institute, La Jolla) and L. Stryer (Stanford) for valuable discussions.
Referência(s)