Revisão Acesso aberto Revisado por pares

A guide to naming human non‐coding RNA genes

2020; Springer Nature; Volume: 39; Issue: 6 Linguagem: Inglês

10.15252/embj.2019103777

ISSN

1460-2075

Autores

Ruth L. Seal, Ling‐Ling Chen, Sam Griffiths‐Jones, Todd M. Lowe, Michael B. Mathews, Dawn O’Reilly, Andrew J. Pierce, Peter F. Stadler, Igor Ulitsky, Sandra L. Wolin, Elspeth A. Bruford,

Tópico(s)

RNA modifications and cancer

Resumo

Review24 February 2020Open Access A guide to naming human non-coding RNA genes Ruth L Seal Corresponding Author Ruth L Seal [email protected] orcid.org/0000-0002-7545-6817 Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Search for more papers by this author Ling-Ling Chen Ling-Ling Chen State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Science, Shanghai, China Search for more papers by this author Sam Griffiths-Jones Sam Griffiths-Jones School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK Search for more papers by this author Todd M Lowe Todd M Lowe Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA Search for more papers by this author Michael B Mathews Michael B Mathews Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ, USA Search for more papers by this author Dawn O'Reilly Dawn O'Reilly Computational Biology and Integrative Genomics Lab, MRC/CRUK Oxford Institute and Department of Oncology, University of Oxford, Oxford, UK Search for more papers by this author Andrew J Pierce Andrew J Pierce Translational Medicine, Oncology R&D, AstraZeneca, Cambridge, UK Search for more papers by this author Peter F Stadler Peter F Stadler Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Institute of Theoretical Chemistry, University of Vienna, Vienna, Austria Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia Santa Fe Institute, Santa Fe, USA Search for more papers by this author Igor Ulitsky Igor Ulitsky orcid.org/0000-0003-0555-6561 Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Sandra L Wolin Sandra L Wolin RNA Biology Laboratory, National Cancer Institute, National Institutes of Health, Frederick, MD, USA Search for more papers by this author Elspeth A Bruford Elspeth A Bruford Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Search for more papers by this author Ruth L Seal Corresponding Author Ruth L Seal [email protected] orcid.org/0000-0002-7545-6817 Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Search for more papers by this author Ling-Ling Chen Ling-Ling Chen State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Science, Shanghai, China Search for more papers by this author Sam Griffiths-Jones Sam Griffiths-Jones School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK Search for more papers by this author Todd M Lowe Todd M Lowe Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA Search for more papers by this author Michael B Mathews Michael B Mathews Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ, USA Search for more papers by this author Dawn O'Reilly Dawn O'Reilly Computational Biology and Integrative Genomics Lab, MRC/CRUK Oxford Institute and Department of Oncology, University of Oxford, Oxford, UK Search for more papers by this author Andrew J Pierce Andrew J Pierce Translational Medicine, Oncology R&D, AstraZeneca, Cambridge, UK Search for more papers by this author Peter F Stadler Peter F Stadler Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Institute of Theoretical Chemistry, University of Vienna, Vienna, Austria Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia Santa Fe Institute, Santa Fe, USA Search for more papers by this author Igor Ulitsky Igor Ulitsky orcid.org/0000-0003-0555-6561 Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Sandra L Wolin Sandra L Wolin RNA Biology Laboratory, National Cancer Institute, National Institutes of Health, Frederick, MD, USA Search for more papers by this author Elspeth A Bruford Elspeth A Bruford Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Search for more papers by this author Author Information Ruth L Seal *,1,2, Ling-Ling Chen3, Sam Griffiths-Jones4, Todd M Lowe5, Michael B Mathews6, Dawn O'Reilly7, Andrew J Pierce8, Peter F Stadler9,10,11,12,13, Igor Ulitsky14, Sandra L Wolin15 and Elspeth A Bruford1,2 1Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK 2European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK 3State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Science, Shanghai, China 4School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK 5Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA 6Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ, USA 7Computational Biology and Integrative Genomics Lab, MRC/CRUK Oxford Institute and Department of Oncology, University of Oxford, Oxford, UK 8Translational Medicine, Oncology R&D, AstraZeneca, Cambridge, UK 9Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany 10Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany 11Institute of Theoretical Chemistry, University of Vienna, Vienna, Austria 12Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia 13Santa Fe Institute, Santa Fe, USA 14Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel 15RNA Biology Laboratory, National Cancer Institute, National Institutes of Health, Frederick, MD, USA *Corresponding author. Tel: +44 (0)1223 494 446; E-mail: [email protected] The EMBO Journal (2020)39:e103777https://doi.org/10.15252/embj.2019103777 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Research on non-coding RNA (ncRNA) is a rapidly expanding field. Providing an official gene symbol and name to ncRNA genes brings order to otherwise potential chaos as it allows unambiguous communication about each gene. The HUGO Gene Nomenclature Committee (HGNC, www.genenames.org) is the only group with the authority to approve symbols for human genes. The HGNC works with specialist advisors for different classes of ncRNA to ensure that ncRNA nomenclature is accurate and informative, where possible. Here, we review each major class of ncRNA that is currently annotated in the human genome and describe how each class is assigned a standardised nomenclature. Introduction The HUGO Gene Nomenclature Committee (HGNC) works under the auspices of Human Genome Organisation (HUGO) and is the only worldwide authority that assigns standardised symbols and names to human genes (Braschi et al, 2019). A unique symbol for every gene is essential to enable unambiguous scientific communication, and approved symbols should be used ubiquitously in research papers, conference talks and posters, and biomedical databases. The HGNC endeavours to approve symbols for all classes of genes that are supported by gene annotation projects and began working on non-coding RNA (ncRNA) nomenclature in the mid-1980s with the approval of initial gene symbols for mitochondrial transfer RNA (tRNA) genes. Since then, we have worked closely with experts in the ncRNA field to develop symbols for many different kinds of ncRNA genes. The number of genes that the HGNC has named per ncRNA class is shown in Fig 1, and ranges in number from over 4,500 long ncRNA (lncRNA) genes and over 1,900 microRNA genes, to just four genes in the vault and Y RNA classes. Every gene symbol has a Symbol Report on our website, www.genenames.org, which displays the gene symbol, gene name, chromosomal location and also includes links to key resources such as Ensembl (Zerbino et al, 2018), NCBI Gene (O'Leary et al, 2016) and GeneCards (Stelzer et al, 2016). We collaborate directly with these biomedical databases and, importantly, these databases always use our gene symbols as the primary symbol for the gene. Due to the relative completeness of the HGNC ncRNA gene set, our data have been chosen as the canonical human dataset in the RNAcentral database (The RNAcentral Consortium, 2019), an RNA sequence database resource. For microRNAs, we work with the specialist resource miRBase (Kozomara et al, 2019), and for tRNAs, we work with the specialist resource GtRNAdb (Chan & Lowe, 2016). We display links to these resources from the relevant Symbol Report. Where available, for lncRNAs we provide specialist links to LNCipedia (Volders et al, 2019), a key lncRNA resource that displays HGNC gene symbols (Box 1). Figure 1. The number of HGNC gene symbols by type of ncRNAA full list of locus types, along with numbers of genes per category, can be found at our Statistics & Downloads webpage (https://www.genenames.org/download/statistics-and-files/). Download figure Download PowerPoint Box 1. Useful resources for non-coding RNA genes used by the HGNC RNA resource Resource URL Description RNAcentral https://rnacentral.org/ Centralised database of non-coding RNA sequences collated from expert non-coding RNA member databases, model organism databases and sequence accession databases miRBase http://www.mirbase.org/ Searchable database of microRNA sequences and annotations. Also hosts the miRBase registry where researchers can submit prospective new microRNAs GtRNAdb http://gtrnadb.ucsc.edu/ The genomic tRNA database, which contains predicted tRNA genes by the tRNAscan-SE program for many different species snoRNABase https://www-snorna.biotoul.fr/ Database of human snoRNA genes; useful resource but no longer being updated LNCipedia https://lncipedia.org/ Database of human long non-coding RNA sequences and manually curated lncRNA articles Ensembl http://www.ensembl.org/ Genome browser for vertebrate genomes that hosts the GENCODE annotation models for non-coding RNA genes for mouse and human genes NCBI Gene https://www.ncbi.nlm.nih.gov/gene/ Integrated annotation and related information for many different genomes. Incudes RefSeq manual annotation of human and mouse non-coding RNA genes For each class of ncRNA, we host curated gene group pages on www.genenames.org—a list of URLs for these is shown in Table 1. Table 1. The HGNC hosts gene group pages for different types of non-coding RNA genes. These pages follow a hierarchical structure and all pages can be browsed starting at the highest-level gene group page labelled "Non-coding RNAs" Gene group name Gene group URL Description Non-coding RNAs https://www.genenames.org/data/genegroup/#!/group/475 Overview page of all non-coding RNAs in the HGNC project. Can be used as a starting point to browse through all types of named ncRNAs MicroRNAs https://www.genenames.org/data/genegroup/#!/group/476 Starting page for all microRNAs, which are split into curated human families where possible. MicroRNAs not in a defined family are listed on the first page MicroRNA host genes https://www.genenames.org/data/genegroup/#!/group/1690 A curated list of microRNA host genes, which is split into protein coding and non-coding subgroups Transfer RNAs https://www.genenames.org/data/genegroup/#!/group/478 Starting page for all transfer RNA genes, with subgroups "Mitochondrially encoded transfer RNAs" and "Cytoplasmic transfer RNAs" (this page also has the subsets "Cytoplasmic transfer RNA pseudogenes" and "Low confidence cytoplasmic transfer RNAs") Small nuclear RNAs https://www.genenames.org/data/genegroup/#!/group/1819 Lists all canonical small nuclear RNA genes; variant snRNA genes are shown as a subgroup Small nucleolar RNAs https://www.genenames.org/data/genegroup/#!/group/844 Starting page for snoRNAs with the subgroups "Small Cajal body-specific RNAs", "Small nucleolar RNAs, C/D box" and "Small nucleolar RNAs, H/ACA box" Small nucleolar RNA host genes https://www.genenames.org/data/genegroup/#!/group/1838 A curated list of snoRNA host genes, which is split into protein coding and non-coding subgroups Ribosomal RNAs https://www.genenames.org/data/genegroup/#!/group/848 Starting page for all ribosomal RNAs, split into the major subgroups "Mitochondrially encoded ribosomal RNAs" and "Cytoplasmic ribosomal RNAs", which is further split into subtypes of rRNAs Vault RNAs https://www.genenames.org/data/genegroup/#!/group/852 Full list of vault RNA genes Y RNAs https://www.genenames.org/data/genegroup/#!/group/853 Full list of Y RNA genes Small NF90 (ILF3) associated RNAs https://www.genenames.org/data/genegroup/#!/group/1624 Full list of SNAR genes Long non-coding RNAs https://www.genenames.org/data/genegroup/#!/group/788 Starting page for all long non-coding RNA gene. Divided into subgroups: Long intergenic non-protein coding RNAs, MicroRNA non-coding host genes, Overlapping transcripts, Intronic transcripts, Antisense RNAs, Divergent transcripts, Small nucleolar RNA non-coding host genes, Long non-coding RNAs with non-systematic symbols, Long non-coding RNAs with FAM root symbol The aim of this paper was to provide an overview for each of the main types of ncRNA that we have named, as well as a guide to how we name them. Each section has been written in collaboration with our specialist advisors for each ncRNA class: Sam Griffiths-Jones of the University of Manchester for microRNAs, Todd Lowe of the University of California, Santa Cruz for tRNAs, Dawn O'Reilly of the University of Oxford for small nuclear RNAs (snRNAs), Peter Stadler of the University of Leipzig for small nucleolar and vault RNAs, Andrew Pierce currently at AstraZeneca, Cambridge for ribosomal RNAs (rRNAs), Sandra Wolin of the NIH for Y RNAs, Michael Mathews of Rutgers New Jersey Medical School for small NF90 (ILF3) associated RNAs, and Igor Ulitsky of the Weizmann Institute of Science and Ling-Ling Chen of the Shaghai Institute for Biochemistry and Cell Biology for long non-coding RNAs. We finish by outlining recommendations for the nomenclature of circular and circular intronic RNAs, which are currently lacking official nomenclature. MicroRNAs MicroRNAs are transcripts of ~ 22 nucleotides that mediate the post-transcriptional regulation of genes via direct binding to messenger RNA (mRNA) molecules. In animal cells, microRNA (miRNA) genes are usually transcribed as long primary transcripts (pri-miRNAs), which are processed by the Drosha microprocessor complex into precursor hairpin stem-loop sequences (pre-miRNAs). These hairpins are exported from the nucleus to the cytoplasm, where the stem-loop is cleaved by the Dicer enzyme to produce a ~ 22 nt duplex. One strand of the duplex associates with an Argonaute (AGO) protein and this microRNA ribonucleoprotein complex (miRNP) binds to sites in mRNAs that are complementary to the miRNA sequence, usually in the 3′ untranslated region (UTR). The Ago-miRNP complex then recruits other proteins, which typically mediate either the degradation or translational repression of the mRNA [for a review, see (Bartel, 2018)]. Approximately 60% of all human genes produce mRNAs that can be bound by miRNAs (Friedman et al, 2009), so these small RNAs provide regulation for diverse biological processes across all tissue types and stages of life. As such, miRNA genes have been implicated in many human diseases including rheumatoid arthritis (Guggino et al, 2018), deafness (Mencía et al, 2009), stroke (Panagal et al, 2019), psoriasis (Yan et al, 2019), cirrhosis (Fernández-Ramos et al, 2018) and several forms of cancer (Kwok et al, 2017). The name "microRNA" to reflect the small size of the active RNA molecule was agreed upon and first used by three Caenorhabditis elegans research groups that published in the same 2001 issue of Science (Lagos-Quintana et al, 2001; Lau et al, 2001; Lee & Ambros, 2001). Once the field of miRNA research started to expand, experts came together to publish guidelines on how to name these transcripts across species (Ambros et al, 2003), and the miRNA Registry was founded to ensure that the same symbols were not mistakenly used by different research groups for different miRNAs (Griffiths-Jones, 2004). The miRNA Registry evolved into the dedicated online miRNA resource miRBase, which has continued to be responsible for providing unique identifiers for miRNAs as well as acting as a database of sequences and curated publications (Kozomara et al, 2019). Researchers submit hairpin and mature microRNA sequences to miRBase, which are then publicly assigned new symbols after manuscript acceptance. miRBase assigns each microRNA stem-loop sequence a symbol in the format "mir-#" and each mature miRNA a symbol in the format "miR-#" followed by a unique sequential number that reflects order of submission to the database. The HGNC then approves a gene symbol for human miRNA genes in the format MIR#; for example, as shown in Fig 2 and Box 2, MIR17 represents the miRNA gene, mir-17 represents the stem-loop, and miR-17 represents the mature miRNA. However, the complete extent of the miRNA gene and primary transcript is not often known, so the entity associated with an HGNC name and entry is frequently the length of the hairpin precursor miRNA, rather than the primary transcript. For genes that encode identical mature miRNAs, the same unique identifier is used followed by a hyphenated numerical suffix; e.g., MIR1-1 and MIR1-2 are distinct genomic loci that encode identical mature miRNAs. For paralogous genes that encode mature miRNAs, which differ by only one or two nucleotides, the same unique identifier is used followed by a letter suffix, e.g. MIR10A and MIR10B. The HGNC does not accept any direct requests for miRNA gene symbols, and all requests must go to miRBase first (please see http://www.mirbase.org/registry.shtml). Box 2. The HGNC Symbol Report for MIR17 provides more than gene nomenclature: as highlighted here there is a link to the HGNC "MIR17 microRNA family group page"; a link out to the relevant microRNA report on miRBase; and where possible a link to the mouse ortholog at MGI and the rat ortholog at RGD Figure 2. The microRNA gene MIR17 is part of a cluster of microRNA genes that are hosted within an intron of the long non-coding RNA gene MIR17HG (miR-17-92a-1 cluster host gene)The symbol MIR17 represents the gene; the symbol mir-17 represents the miRNA precursor stem-loop structure; and the symbol miR-17 represents the active mature microRNA, which interacts with an AGO protein to form the AGO/miRNA silencing complex. Download figure Download PowerPoint In accordance with miRBase, the HGNC provides one gene symbol per miRNA gene, even though miRNAs are sometimes processed from the same transcripts as proteins or other miRNAs, and therefore might not be considered separate genes in the canonical sense. For example, many miRNAs are hosted in the introns, or less frequently the exons, of protein coding genes or long non-coding RNA genes (Fig 2 and Box 2). The HGNC has curated gene group pages listing these host genes (Table 1), and the naming conventions for non-coding miRNA host genes are discussed in the long non-coding RNA section below. Recently, there have been a few ideas published on how to "improve" miRNA nomenclature, including correcting the identifiers of particular miRNA genes to show evolutionary relationships (e.g. Desvignes et al, 2015; Fromm et al, 2015; Budak et al, 2016). As nomenclature advisors, we understand the desire to perfect nomenclature systems once more information becomes available. At the same time, experience has taught us that such revised systems are often not fully adopted and may cause considerable confusion in the community. It can therefore be more appropriate to find other ways to represent relationships between genes, in order to maintain stable gene symbols. The HGNC has recently curated gene groups to show paralogous relationships between human miRNA genes, based on the family groups at miRBase and information in publications. For example, the "MicroRNA MIR1/206 family" contains the family members MIR1-1, MIR1-2 and MIR206. The miRNA symbol miR-206 has already been used in over 600 papers so it would be unhelpful to try to alter this symbol. However, the MIR206 Symbol Report now provides a link to the curated MicroRNA MIR1/206 family gene group page, where there are also associated publications and a link through to the corresponding miRBase Family MIPF0000038 page, which lists orthologous and paralogous miRNAs in different species. Where possible, the miRNA Symbol Reports on genenames.org also display the mouse and rat miRNA orthologs, with links to the relevant gene report on the Mouse Genomic Database (http://www.informatics.jax.org/) and Rat Genome Database (https://rgd.mcw.edu/), see Box 2. Transfer RNAs Transfer RNA was the first type of non-coding RNA to be characterised over 60 years ago (Hoagland et al, 1958). The term "transfer" (Smith et al, 1959) represents the function of this RNA in transferring amino acids from the cytosol of the cell to the ribosome where the amino acids are bonded together to form a peptide according to the sequence of the mRNA being translated. Typical tRNAs vary in size from 73 to 93 nucleotides (Rich & RajBhandary, 1976) and have a distinctive cloverleaf secondary structure that folds into an L-shaped tertiary structure (Kim et al, 1973). At one end of the L is the CCA acceptor site where the tRNA binds to the relevant amino acid (Hou, 2010) and at the other end is a loop that contains the three-nucleotide anticodon which precisely pairs to the codons of mRNA (Kim et al, 1973). The first two nucleotides of the anticodon form Watson-Crick base pairs with the corresponding mRNA codon, while the third nucleotide can form "wobble" pairing which allows one tRNA to recognise more than one mRNA codon. Post-transcriptional modifications at the "wobble" position can influence binding to a particular mRNA codon (Agris et al, 2018). Transfer RNA genes share characteristics that make it possible to predict them from genomic sequence. The Genomic tRNA Database (GtRNAdb) (Chan & Lowe, 2016) contains predicted tRNA gene sets for thousands of species across Eukaryota, Archaea and Bacteria, including a set of 429 high confidence tRNA genes for the most current human reference genome, GRCh38. tRNA gene predictions are made using the tRNAscan-SE analysis pipeline (Lowe & Chan, 2016), which uses probabilistic tRNA primary sequence and secondary structure "covariance models" to determine the gene loci and the functional identity (i.e. tRNA isotype and anticodon) for each putative tRNA gene. The predicted tRNA genes then undergo further analysis by comparison with isotype-specific covariance models to give confirmation of isotype classification. The GtRNAdb assigns a unique ID to each tRNA gene in the format tRNA-[three letter amino acid code]-[anticodon]-[GtRNAdb gene identifier], e.g. tRNA-Ala-AGC-1-1. (Note the "GtRNAdb gene identifier" is actually made up of two numbers, the first is a "transcript ID", the second a "locus ID", such that multiple gene loci producing identical tRNA transcripts share the same transcript ID, but each have a different locus numbers; e.g., Ala-AGC-1-1 and Ala-AGC-1-2 are two different gene loci producing identical mature tRNAs, whereas Ala-AGC-2-1 and Ala-AGC-3-1 are genes that each produce different tRNA transcripts.) The HGNC assigns a slightly condensed but equivalent tRNA gene symbol in the format TR[one letter amino acid code]-[anticodon][GtRNAdb gene identifier], e.g. TRA-AGC1-1 (Fig 3). tRNAscan-SE analysis also predicts tRNA pseudogenes and candidate genes that include atypical tRNA features and may not be transcribed and/or may not be capable of ribosomal translation. To reflect these different sets, the HGNC displays the gene groups "Cytosolic transfer RNAs", "Low confidence cytosolic transfer" RNAs and "Transfer RNA pseudogenes on genenames.org" (Table 1). Figure 3. An annotated tRNA gene symbol explaining what each part of the approved gene symbol represents Download figure Download PowerPoint The human mitochondrial genome contains 22 tRNA genes (Anderson et al, 1981) that encode tRNAs with both canonical and non-canonical cloverleaf structures which enable translation within mitochondrial ribosomes in the mitochondria. While pathological mutations in cytosolic tRNA genes have not yet been discovered, mutations in mitochondrial tRNA genes cause a variety of well-studied mitochondrial diseases such as MELAS (mitochondrial encephalomyopathy, lactic acidosis and stroke-like episodes) and MERRF (myoclonic epilepsy with ragged red fibres) (Suzuki & Nagao, 2011; Abbott et al, 2014). Mitochondrial tRNA genes were named in collaboration with the MitoMap resource (Lott et al, 2013); gene symbols are of the format "MT-T + one letter amino acid code"; e.g., MT-TA represents the mitochondrial tRNA gene that recruits alanine. Most amino acids are decoded by just one human mitochondrial tRNA, but there are two mitochondrial leucine and serine tRNA genes—these gene symbols therefore include numbers to distinguish the individual loci: MT-TL1, MT-TL2, MT-TS1 and MT-TS2. Small nuclear RNAs Small nuclear RNAs are abundant transcripts of around 150 nucleotides that end in a 3′ stem loop (Matera et al, 2007). While the name of this RNA class is based on cellular location, each individual snRNA has a "U" identifier that stems from the historical name "U-RNA" which was derived from early observations of their high uridine content (Hodnett & Busch, 1968). The U-RNAs were numbered according to their apparent abundance when discovered (Chen & Moore, 2015). Some of these were subsequently found to be small nucleolar RNAs (snoRNAs) resulting in the following numbering for the snRNAs: U1, U2, U4, U5, U6, U7, U11 and U12. Most snRNAs are involved in the splicing of introns from pre-mRNA as part of either the major or minor spliceosome. The major spliceosome features U1, U2, U4, U5 and U6 snRNPs, plus many other non-snRNP proteins, and performs splicing of U2-type introns. Here, the U1 and U2 snRNPs assemble on introns and are joined by the preassembled U4/U6.U5 tri-snRNP. This is followed by a series of rearrangements resulting in the formation of the U2/U6 catalytic core and the splicing reaction (Anokhina et al, 2013), and finally release of the spliced RNA and disassembly of the spliceosome. The minor spliceosome splices U12-type introns, which make up < 0.5% of introns in the genome (Turunen et al, 2013). It contains the same U5 snRNA as the major spliceosome, but in contrast consists of the snRNAs U11, U12, U4atac and U6atac, which are functional analogs of the major spliceosome U1, U2, U4 and U6 snRNAs. Minor spliceosome snRNAs can fold into similar structures to their equivalent major spliceosome snRNAs, but display limited sequence similarity to them (Will & Lührmann, 2005). The term "atac" in U4atac and U6atac refers to the AT/AC splice sites found in the first U12-type introns to be discovered (Tarn & Steitz, 1996). Instead of splicing, U7 snRNA is involved in processing the distinctive 3′ end stem loop of histone mRNA by binding to the histone downstream element and recruiting proteins, some of which shared with the spliceosome (Strub et al, 1984; Marz et al, 2007). Most snRNAs are transcribed by RNA polymerase II, with the exception of U6 and U6atac, which are transcribed by RNA polymerase III (Singh & Reddy, 1989; Younis et al, 2013). All snRNA genes are named with the root symbol "RNU" for "RNA, U# small nuclear". The GRCh38 human reference genome contains four annotated U1-encoding loci: RNU1-1, RNU1-2, RNU1-3 and RNU1-4, although individuals may have around 30 copies of tandemly repeated U1 genes (Lund & Dahlberg, 1984). The GRCh38 reference also contains a single U2 gene (RNU2-1), which resides in a 6 kb region that is organised as a tandem array of 10–20 copies in many individuals (Van Arsdell & Weiner, 1984). The U7 (RNU7-1), U11 (RNU11), U12 (RNU12), U4atac (RNU4ATAC) and U6atac (RNU6ATAC) snRNAs are each encoded by a single gene. There are two U4 and five U6 genes, which have numerical identifiers in the same format as the U1 genes, e.g. RNU4-1, RNU6-2, while the five U5 genes have letter identifiers based on the scientific literature (Sontheimer & Steitz, 1992): RNU5A-1, RNU5B-1, RNU5D-1, RNU5E-1 and RNU5F-1. The human genome contains over 1,000 divergent gene copies of snRNA genes (Vazquez-Arango & O'Reilly, 2018), most of which are presumed to be unexpressed pseudogenes. In the case of the U1 family, some of the genes present on the 1q21.1 cluster have been shown to be expressed, undergo 3′ end proce

Referência(s)