The landscape of human mutually exclusive splicing
2017; Springer Nature; Volume: 13; Issue: 12 Linguagem: Inglês
10.15252/msb.20177728
ISSN1744-4292
AutoresKlas Hatje, Raza‐Ur Rahman, Ramón Vidal, Dominic Simm, Björn Hammesfahr, V. Bansal, Ashish Rajput, Michel‐Edwar Mickael, Ting Sun, Stefan Bonn, Martin Kollmar,
Tópico(s)RNA modifications and cancer
ResumoArticle14 December 2017Open Access Transparent process The landscape of human mutually exclusive splicing Klas Hatje Klas Hatje Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Search for more papers by this author Raza-Ur Rahman Raza-Ur Rahman Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Ramon O Vidal Ramon O Vidal Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Search for more papers by this author Dominic Simm Dominic Simm Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University, Göttingen, Germany Search for more papers by this author Björn Hammesfahr Björn Hammesfahr Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Search for more papers by this author Vikas Bansal Vikas Bansal orcid.org/0000-0002-0944-7226 Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Ashish Rajput Ashish Rajput orcid.org/0000-0002-6741-8861 Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Michel Edwar Mickael Michel Edwar Mickael Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Ting Sun Ting Sun Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Stefan Bonn Corresponding Author Stefan Bonn [email protected] orcid.org/0000-0003-4366-5662 Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany German Center for Neurodegenerative Diseases, Tübingen, Germany Search for more papers by this author Martin Kollmar Corresponding Author Martin Kollmar [email protected] orcid.org/0000-0002-9768-1855 Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Search for more papers by this author Klas Hatje Klas Hatje Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Search for more papers by this author Raza-Ur Rahman Raza-Ur Rahman Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Ramon O Vidal Ramon O Vidal Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Search for more papers by this author Dominic Simm Dominic Simm Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University, Göttingen, Germany Search for more papers by this author Björn Hammesfahr Björn Hammesfahr Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Search for more papers by this author Vikas Bansal Vikas Bansal orcid.org/0000-0002-0944-7226 Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Ashish Rajput Ashish Rajput orcid.org/0000-0002-6741-8861 Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Michel Edwar Mickael Michel Edwar Mickael Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Ting Sun Ting Sun Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany Search for more papers by this author Stefan Bonn Corresponding Author Stefan Bonn [email protected] orcid.org/0000-0003-4366-5662 Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany German Center for Neurodegenerative Diseases, Tübingen, Germany Search for more papers by this author Martin Kollmar Corresponding Author Martin Kollmar [email protected] orcid.org/0000-0002-9768-1855 Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany Search for more papers by this author Author Information Klas Hatje1,2,6, Raza-Ur Rahman2,3, Ramon O Vidal2,7, Dominic Simm1,4, Björn Hammesfahr1,8, Vikas Bansal2,3, Ashish Rajput2,3, Michel Edwar Mickael2,3, Ting Sun2,3, Stefan Bonn *,2,3,5 and Martin Kollmar *,1 1Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany 2Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany 3Center for Molecular Neurobiology, Institute of Medical Systems Biology, University Clinic Hamburg-Eppendorf, Hamburg, Germany 4Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University, Göttingen, Germany 5German Center for Neurodegenerative Diseases, Tübingen, Germany 6Present address: Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Basel, Switzerland 7Present address: Max-Delbrück-Center for Molecular Medicine, Berlin, Germany 8Present address: Research and Development—Data Management (RD-DM), KWS SAAT SE, Einbeck, Germany *Corresponding author. Tel: +49 40 7410 55082; E-mail: [email protected] *Corresponding author. Tel: +49 551 5036960; E-mail: [email protected] Molecular Systems Biology (2017)13:959https://doi.org/10.15252/msb.20177728 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Mutually exclusive splicing of exons is a mechanism of functional gene and protein diversification with pivotal roles in organismal development and diseases such as Timothy syndrome, cardiomyopathy and cancer in humans. In order to obtain a first genomewide estimate of the extent and biological role of mutually exclusive splicing in humans, we predicted and subsequently validated mutually exclusive exons (MXEs) using 515 publically available RNA-Seq datasets. Here, we provide evidence for the expression of over 855 MXEs, 42% of which represent novel exons, increasing the annotated human mutually exclusive exome more than fivefold. The data provide strong evidence for the existence of large and multi-cluster MXEs in higher vertebrates and offer new insights into MXE evolution. More than 82% of the MXE clusters are conserved in mammals, and five clusters have homologous clusters in Drosophila. Finally, MXEs are significantly enriched in pathogenic mutations and their spatio-temporal expression might predict human disease pathology. Synopsis Predicted human mutually exclusive splicing events are validated using 515 published RNA-seq datasets, and their extent, mechanism, and functional roles in health and disease are investigated. 855 mutually exclusive exons (MXEs) were validated, 42% of which are novel exons. Many large and multi-cluster MXEs exist in mammals. Over 82% of the detected MXEs are evolutionary conserved. Exon-level MXE expression patterns might predict human pathology. Introduction Alternative splicing of pre-messenger RNAs is a mechanism common to almost all eukaryotes to generate a plethora of protein variants out of a limited number of genes (Matlin et al, 2005; Nilsen & Graveley, 2010; Lee & Rio, 2015). High-throughput studies suggested that not only 95–100% of all multi-exon genes in human are affected (Pan et al, 2008; Wang et al, 2008; Gerstein et al, 2014) but also that alternative splicing patterns strongly diverged between vertebrate lineages implying a pronounced role in the evolution of phenotypic complexity (Barbosa-Morais et al, 2012; Merkin et al, 2012). Five types of alternative splicing have been identified to contribute to most mRNA isoforms, which are differential exon inclusion (exon skipping), intron retention, alternative 5′ and 3′ exon splicing, and mutually exclusive splicing (Blencowe, 2006; Pan et al, 2008; Wang et al, 2008; Nilsen & Graveley, 2010). Mutually exclusive splicing generates alternative isoforms by retaining only one exon of a cluster of neighbouring internal exons in the mature transcript and is a sophisticated way to modulate protein function (Letunic et al, 2002; Meijers et al, 2007; Pohl et al, 2013; Tress et al, 2017a). The most extreme cases known so far are the arthropod DSCAM genes, for which up to 99 mutually exclusive exons (MXEs) spread into four clusters were identified (Schmucker et al, 2000; Lee et al, 2010; Pillmann et al, 2011). Opposed to arthropods, current evidence suggests that vertebrate MXEs only occur in pairs (Matlin et al, 2005; Gerstein et al, 2014; Abascal et al, 2015a), and genomewide estimates in human range from 118 (Suyama, 2013) to at most 167 cases (Wang et al, 2008). Despite these relatively few reported cases, mutually exclusive splicing might be far more frequent in humans than currently anticipated, as has been recently revealed in the model organism Drosophila melanogaster (Hatje & Kollmar, 2013). Apart from their low number, MXEs have been described in many crucial and essential human genes such as in the α-subunits of six of the 10 voltage-gated sodium channels (SCN genes) (Copley, 2004), in each of the glutamate receptor subunits 1–4 (GluR1-4) where the MXEs are called flip and flop (Sommer et al, 1990), and in SNAP-25 as part of the neuroexocytosis machinery (Johansson et al, 2008). Although MXEs within a cluster often share high similarity at the sequence level, they are usually not functionally redundant, as their inclusion in the mRNAs is tightly regulated. Thus, mutations in MXEs have been shown to cause diseases such as Timothy syndrome (missense mutation in the CACNA1C gene) (Splawski et al, 2004, 2005), cardiomyopathy (defect of the mitochondrial phosphate carrier SLC25A3) (Mayr et al, 2011) or cancer (mutations in, e.g., the pyruvate kinase PKM and the zinc transporter SLC39A14) (David et al, 2010). Despite the implications of mutually exclusive splicing in organismal development and disease, current knowledge on the magnitude of MXE usage and its relevance in biological processes is far from complete. In order to obtain a genomewide, unbiased estimate of the extent and biological role of mutually exclusive splicing in humans, a set of 6,541 MXE candidates was compiled from annotated and novel predicted exons, and rigorously validated using over 15 billion reads from 515 RNA-Seq datasets. Results The human genome contains 855 high-confidence MXEs Compared to other splicing mechanisms, mutually exclusive splicing in humans seems to be a rare event. MXEs are characterized by genomic vicinity, splice-site compatibility and mutually exclusive presence in protein isoforms. Accordingly, the human genome annotation (GenBank v. 37.3) contains only 158 MXEs in 79 protein-coding genes (Appendix Figs S1–S3). MXEs are often phrased “homologous exons” in the literature because they likely originated from the same ancestral exon. We refrain from using this term throughout our analysis, because several MXEs present in the genome annotation do not show any sequence homology and many neighbouring exons with high sequence similarity are not spliced in a mutually exclusive manner. In a first attempt to chart an atlas of genomewide mutually exclusive splicing in humans, we decided to predict potential MXE candidates and validate those using published RNA-Seq data. In a first step, we generated a set of MXE candidates in the human genome (v. 37.3) from all annotated protein-coding exons and from novel exons predicted in intronic regions including only internal exons in the candidate list (Fig 1A, Appendix Figs S1–S4). From the annotated exons, we selected those that appeared mutually exclusive in transcripts, and neighbouring exons that show sequence similarity and are translated in the same reading frame. To generate novel exon candidates, we predicted exonic regions in neighbouring introns of annotated exons based on sequence similarity and similar lengths (Pillmann et al, 2011). We did not consider potential MXEs containing in-frame stop codons such as the neonatal-specific MXE reported for the sodium channel SCN8A (Zubović et al, 2012), and exons overlapping annotated terminal exons (Appendix Fig S2). The reconstruction resulted in a set of 6,541 MXE candidates in 1,542 protein-coding genes, including 1,058 (68.6%) genes for which we predicted 1,722 completely novel exons in previously intronic regions (Fig 1B). Most introns in human genes are extremely long necessitating careful and strict validation of the MXE candidates to exclude false-positive predictions (Lee & Rio, 2015). Figure 1. The human genome contains 1,399 high-confidence MXEs Schematic representation of the various annotated and predicted exon types included in the MXE candidate list. For MXE validation, at least three restraints must be fulfilled: the absence of an MXE-joining read (R1), except for those leading to frame shift, and the presence of two MXE-bridging SJ reads (R2 and R3). Prediction and validation of 1,399 1SJ (855 3SJ) human MXEs. Top: Dataset of 6,541 MXE candidates from annotated and predicted exons. Bottom left: MXE candidates for which splice junction data are currently missing hindering their annotation as MXE or other splice variant. Bottom right: Validation of the MXE candidates using over 15 billion RNA-Seq reads. The outer circles represent the validation based on at least a single read for each of the validation criteria (1SJ), while the validation shown in the inner circles required at least three reads (3SJ). MXE saturation analysis. Whereas increasing amounts of RNA-Seq reads should lead to the confirmation of further MXE candidates, more RNA-Seq reads might also result in the rejection of previously validated MXEs. The green curves show the number of validated MXEs in relation to the percentage of total RNA-Seq reads used for validation. The orange curves indicate the number of initially “validated MXEs” that were rejected with increasing amounts of reads. Grey dashed lines indicate the point of saturation, which is defined as the point where a twofold increase in reads leads to rejection of less than 1% of the validated MXEs. Of note, whereas the rejection of validated MXEs saturates with 20% of the data, the amount of novel MXE validations is still rapidly increasing. Distribution of validated MXEs in two-exon and multi-exon clusters. Size and distribution of multi-cluster MXEs. The CUX1 gene (cut-like homeobox 1) contains two interleaved clusters of MXEs (clusters 1 and 2) and two standard clusters each with two MXEs (clusters 3 and 4). The exon 3 and exon 4 variants each are orthologous exons. The exon 4 variants are mutually exclusive (cluster 2). Exon 3a is a differentially included exon and only spliced together with exon 4a. The exons 3b, 3c, 3d and 3e are part of a cluster of four MXEs (cluster 1) and are only spliced together with exon 4b (Appendix Figs S16 and S17). Novel exons are labelled with an asterisk. Download figure Download PowerPoint To validate the predicted MXE candidates, we made use of over 15 billion publically available RNA-Seq reads, selecting 515 samples comprising 31 tissues and organs, 12 cell lines and seven developmental stages (Barbosa-Morais et al, 2012; Djebali et al, 2012; Tilgner et al, 2012; Xue et al, 2013; Yan et al, 2013; Fagerberg et al, 2014; Dataset EV1). The data were chosen to encompass common and rare potential splice events in a broad range of tissues, cell types and embryonic stages. Accordingly, the transcription of 6,466 (99%) of the MXE candidates is supported by RNA-Seq reads mapped to the genome (Appendix Fig S3A). To be validated as true mutually exclusive splicing event, each MXE of a cluster needed to exhibit splice junction (SJ) reads from every MXE to up- or downstream gene regions bridging the other MXE(s) of the cluster (Fig 1A). In addition, MXEs should not exhibit any SJ reads to another MXE except when the combined inclusion causes a frame shift and therefore a premature stop codon (Fig 1A, Appendix Figs S3A and D, S5, and S6). These stringent criteria define a high-confidence set of MXEs, requiring three constraints for a cluster of two MXEs and already 18 constraints for a cluster of five MXEs (Appendix Fig S7). In case of clusters with more than two MXE candidates, the validation criteria were applied to the cluster including all MXE candidates as well as to all possible sub-clusters to identify the largest cluster fulfilling all MXE criteria. According to these criteria, 1,399 MXEs were verified with at least one SJ read per exon (1SJ), supported by 2.2 million exon mapping and 34 million SJ reads, increasing the total count of human MXEs by almost an order of magnitude (158–1,399) (Fig 1B, Dataset EV2); 855 MXEs were found to be supported by at least three splice junction reads per exon (3SJ) validated by 1.5 million exon mapping and 27 million SJ reads (Appendix Figs S3B and C, S8–S10). The 1,399 (855, numbers in brackets refer to the 3SJ validation) verified MXEs include 122 (112) annotated MXEs (Fig 1B “annotated MXE”), 623 (388) exons that were previously annotated as constitutive or differentially included (“annotated other splicing”) and 654 (358) exons newly predicted in intronic regions (“novel exon”). Our analysis also showed that 29 of the 158 annotated MXEs are in fact not mutually exclusively spliced but represent constitutively spliced exons or other types of alternative splicing (Appendix Figs S2 and S3E). Finally, 1,741 (2,336) MXE candidates including 1,090 (1,402) newly predicted exons and 17 (29) of the annotated MXEs are supported by 0.5 million exon and 13 million SJ matching reads but still have to be regarded as MXE candidates because not all annotation criteria were fulfilled (Appendix Fig S3A and E). To estimate the dependence of MXE confirmation and rejection on data quantity, we cross-validated the MXE gain (validation) and loss (rejection) events for several subsets of the total RNA-Seq data (Fig 1C, Appendix Fig S11, Materials and Methods “Saturation analysis”). The course of the curves provides strong evidence for the validity of the MXEs because a single exon-joining read would already be sufficient to reject an MXE cluster while at least two SJ reads are needed to validate one. Whereas even 15 billion RNA-Seq reads do not achieve saturation for the amount of validated MXEs, the gain in rejected MXE candidates is virtually saturated using 25% of the data. To further validate the list of MXEs, we compared MXE clusters that contained two “annotated other splicing” exons to splicing information from GTEx portal (https://www.gtexportal.org/home/). Although GTEx portal uses an alternative aligner and different alignment settings, all MXEs that we compared showed mutually exclusive behaviour in GTEx portal (Appendix Fig S12), substantiating our results. Lastly, we selected six brain-expressed novel MXEs for qPCR validation in human brain total RNA. All assayed MXEs showed perfect coherence with the alignment results, confirming mutually exclusive splicing of all assayed novel MXEs in human brain (Appendix Fig S13, Dataset EV3). Many of the 1,399 (855) MXEs have roles in the cardiac and muscle function and development, while cassette exons are enriched for microtubule- and organelle localization-related terms (Appendix Fig S14). In summary, the high-confidence set of 1,399 (855) MXEs extends current knowledge of human MXE usage by an order of magnitude, (re)-annotating over a thousand existing and predicted exons and isoforms, while suggesting the existence of further human MXEs. The human genome contains large cluster and multi-cluster MXEs In general, mutually exclusive splicing can be quite complex. This is best demonstrated by genes in arthropods that contain both multiple MXE clusters (“multi-cluster”) and large clusters with up to 53 MXEs such as in the Drosophila Dscam genes (Graveley et al, 2004; Pillmann et al, 2011). This is in strong contrast to mutually exclusive splicing in vertebrates as there is to date no evidence of multi-cluster or higher order MXE clusters (Matlin et al, 2005; Pan et al, 2008; Wang et al, 2008; Gerstein et al, 2014; Abascal et al, 2015a,b). The analysis of the 1,399 validated human MXEs provides first evidence for clusters of multiple MXEs in the human genome (Fig 1D, Appendix Fig S15). While most MXEs are present in clusters of two exons (1,116 MXEs), a surprisingly high number of clusters have three to 10 MXEs (283 MXEs in 71 clusters). Interestingly, although a large part of the verified MXEs contain a single MXE cluster (554 genes, Fig 1E), we could also provide evidence for human genes containing multiple MXE clusters. Thus, TCF3, NEB, ANKRD36C and MTHFD1L contain three clusters and TTN, CAMK2D and CUX1 four clusters of MXEs. A very interesting case of complex interleaved mutually exclusive splicing can be seen for CUX1, the transcription factor cut-like homeobox 1. It contains a cluster of MXEs (exons 3b–3e) that is differentially included into a set of two exons (exon 3 and exon 4), and the two sets are themselves mutually exclusive (Fig 1F, Appendix Figs S16 and S17). The identification of large clusters with multiple MXEs and many genes with multiple clusters shows that complex mutually exclusive splicing is not restricted to arthropods (Schmucker et al, 2000; Graveley, 2005; Lee et al, 2010; Hatje & Kollmar, 2013) but might be present in all bilateria. Mutually exclusive presence of coding exons in functionally active transcripts To understand which splicing mechanisms might be primarily responsible for the regulation of mutually exclusive splicing in humans, we investigated several mechanisms that were shown to act in some specific cases and were proposed to coordinate mutually exclusive splicing in general (Fig 2A; Letunic et al, 2002; Smith, 2005). We identified five cases (0.79% of all clusters) of U2 and U12 splice acceptor incompatibility (Appendix Fig S18) and 57 (9%) cases of potential steric interference, a too short distance between splice donor sites and branch points (< 50 bp; Fig 2B and Appendix Fig S19). Although 377 (60%) of the MXE clusters contain exons with exon lengths not divisible by three which would result in non-functional transcripts in case of combined inclusion, MXE-joining reads were found for only 83 (22%) of these clusters (Fig 2B; Appendix Figs S3B and D, and S20). Surprisingly, the majority of the annotated MXEs are of this type (91 of 122; 75%) as well as many exons previously annotated as other splice types (44 of 662), but only few of the novel MXEs predicted in intronic regions (25 of 615; Appendix Fig S3A and D). These numbers suggest that splicing of the remaining 484 MXE clusters is tightly regulated by other mechanisms (Fig 2B) such as RNA–protein interactions, interactions between small nuclear ribonucleoproteins and splicing factors (Lee & Rio, 2015), and competitive RNA secondary structural elements (Graveley, 2005; Yang et al, 2012; Lee & Rio, 2015). Competing RNA secondary structures are, however, usually not conserved across long evolutionary distances. A potential case of a docker site and selector sequences downstream of each exon variant was identified for the cluster of four MXEs in the CD55 gene (Appendix Fig S21). Figure 2. MXE presence is regulated at the RNA and protein folding level Schematic representation of MXE splicing regulation via splice-site incompatibility, branch point proximity and translational frame shift leading to NMD. Observed usage of MXE splicing regulation in 629 MXE clusters. By mutually exclusive inclusion into transcripts, MXEs of a cluster are supposed to encode the same region of a protein structure. If the respective regions of the protein structures are embedded within secondary structural elements (the ends of the exon-encoded peptides are part of α-helices and/or β-strands), it is highly unlikely that the translation of a transcript will result in a folded protein in case the respective exon is missing (skipped exon). If the MXEs have highly similar sequences and do not encode repeat regions, it seems unlikely that either could be present in tandem or absent at all in a folded protein. Here, we have combined protein structure features (colours) with splicing regulation information (symbols). Accordingly, 87% of the MXE-encoded protein regions are embedded in secondary structural elements (orange and green symbols), and most of the remaining MXEs can only be spliced mutually exclusive because splicing as differentially included exons would lead to frame shifts (blue circles). As examples, we labelled many MXE clusters distinguishing annotated MXEs (purple letters), known exons that we validated as MXEs (orange letters), and clusters containing novel exons (dark-grey letters). Download figure Download PowerPoint In contrast to cassette exons and micro-exons, which tend to be located in surface loops and intrinsically disordered regions instead of folded domains (Buljan et al, 2012; Ellis et al, 2012; Irimia et al, 2014), all MXEs, whose protein structures have been analysed, are embedded within folded structural domains as has been shown for, for example, DSCAM (Meijers et al, 2007), H2AFY (Kustatscher et al, 2005), the myosin motor domain (Kollmar & Hatje, 2014) and SLC25A3 (Tress et al, 2017a). As we have shown in the beginning, there is also a subset of 73 MXEs not showing any sequence homology (“annotated no similarity”). It is unlikely that the encoded peptides account for identical secondary structural elements. Rather, if the MXEs of this subset are true MXEs, there is a small subset (about 5%) of MXEs whose mutual inclusion leads to considerably altered protein folds or affects surface loops and disordered regions similar to cassette exons. Because MXEs are supposed to modulate protein functions through variations and not alterations in specific restricted parts of the structure, we thought it could be possible to distinguish MXEs from cassette exons at a protein structural level. Such an analysis could provide complementary evidence for the validation as MXE in contrast to two (or more) neighbouring cassette exons. While one and only one of the exons of a cluster of MXEs has to be included in the transcript, the defining feature of a cassette exon is that it can either be present or absent. If MXEs were mis-classified and in fact neighbouring cassette exons, it would therefore be possible that all exons of the cluster were present or absent from the transcript, and accordingly the protein structure. These differences between MXEs and cassette exons impose three restrictions on their localization within protein folds (Appendix Fig S22). Thus, (i) if one or both ends of the MXE-encoded peptide end within a secondary structural element, it seems impossible that the respe
Referência(s)