Artigo Acesso aberto Revisado por pares

Regulation of 3′ Splice Site Selection in the 844ins68 Polymorphism of the Cystathionine β-Synthase Gene

2002; Elsevier BV; Volume: 277; Issue: 46 Linguagem: Inglês

10.1074/jbc.m208107200

ISSN

1083-351X

Autores

Maurizio Romano, Roberto Marcucci, Emanuele Buratti, Youhna M. Ayala, Gianfranco Sebastio, Francisco E. Baralle,

Tópico(s)

Folate and B Vitamins Research

Resumo

844ins68 is a frequent polymorphism of the cystathionine β-synthase gene (CBS) that consists of a 68-bp insertion duplicating the 3′ splice site of intron 7 and the 5′-end of exon 8. The presence of two identical 3′ splice sites spaced by 68 bp should lead to either a selection of the proximal site or to at least two alternatively spliced CBS mRNA variants. Instead, an accurate selection of the distal 3′ splice site is observed in the 844ins68 carriers. The duplication has generated a gene re-arrangement at the 3′ splice site where two GGGG runs have been brought close to each other. Using a minigene system, we have investigated the effect this peculiar configuration might have on the selection of the 3′ splice site of intron 7 in the CBS gene. Minimal disruption of the G runs resulted in a dramatic shift toward the proximal 3′ splice site selection with inclusion of the 68-bp insertion and a consequent change of the reading frame. The insertional event created this peculiar configuration of two G repeats close to each other that subsequently acquired the ability to strongly bind heterogeneous nuclear ribonucleoprotein (hnRNP) H1, a specific trans-acting factor. The interaction of hnRNP H1 with G runs within the 844ins68 context might interfere with the recruitment of splicing factors to the proximal 3′ splice site thus favoring the selection of the distal 3′ splice site. Our results therefore suggest the possibility that the insertion was an evolutionary event that allowed the rescue of the wild-type sequence, so preserving protein function. 844ins68 is a frequent polymorphism of the cystathionine β-synthase gene (CBS) that consists of a 68-bp insertion duplicating the 3′ splice site of intron 7 and the 5′-end of exon 8. The presence of two identical 3′ splice sites spaced by 68 bp should lead to either a selection of the proximal site or to at least two alternatively spliced CBS mRNA variants. Instead, an accurate selection of the distal 3′ splice site is observed in the 844ins68 carriers. The duplication has generated a gene re-arrangement at the 3′ splice site where two GGGG runs have been brought close to each other. Using a minigene system, we have investigated the effect this peculiar configuration might have on the selection of the 3′ splice site of intron 7 in the CBS gene. Minimal disruption of the G runs resulted in a dramatic shift toward the proximal 3′ splice site selection with inclusion of the 68-bp insertion and a consequent change of the reading frame. The insertional event created this peculiar configuration of two G repeats close to each other that subsequently acquired the ability to strongly bind heterogeneous nuclear ribonucleoprotein (hnRNP) H1, a specific trans-acting factor. The interaction of hnRNP H1 with G runs within the 844ins68 context might interfere with the recruitment of splicing factors to the proximal 3′ splice site thus favoring the selection of the distal 3′ splice site. Our results therefore suggest the possibility that the insertion was an evolutionary event that allowed the rescue of the wild-type sequence, so preserving protein function. The identification of the 844ins68 insertion of the cystathionine β-synthase gene was initially reported in a patient affected by homocystinuria (online Mendelian Inheritance in Man no. 236200) due to CBS 1The abbreviations used are: CBS, cystathionine β-synthase; nt, nucleotide(s); WT, wild-type; MS, mass spectrometry; hnRNP, heterogeneous nuclear ribonucleoprotein; snRNA, small nuclear RNA; EMSA, electrophoretic mobility shift assay deficiency (1Sebastio G. Sperandeo M.P. Panico M. de Franchis R. Kraus J.P. Andria G. Am. J. Hum. Genet. 1995; 56: 1324-1333PubMed Google Scholar). Subsequent studies showed that the 844ins68 insertion is not a disease-causing mutation but is a common polymorphism in the general population, with a frequency of between 5 and 10% in Caucasians (2Sperandeo M.P. de Franchis R. Andria G. Sebastio G. Am. J. Hum. Genet. 1996; 59: 1391-1393PubMed Google Scholar,3Tsai M.Y. Bignell M. Schwichtenberg K. Hanson N.Q. Am. J. Hum. Genet. 1996; 59: 1262-1267PubMed Google Scholar). It is absent among Asians and has a much higher prevalence among blacks (37.7% of heterozygotes and 4% of homozygotes) (4Franco R.F. Elion J. Lavinha J. Krishnamoorthy R. Tavella M.H. Zago M.A. Hum. Hered. 1998; 48: 338-342Crossref PubMed Scopus (36) Google Scholar). The 844ins68 polymorphism consists of the insertion of 68 bp within exon 8 of the CBS gene and results in the presence of two 68-bp identical DNA repeats except in the presence of the CBS deficiency-causing mutation T833C in the first repeat (see Fig. 1). In fact, the 844ins68 polymorphism represents a duplication of the 3′ splice site at the CBS intron 7/exon 8 junction (53 and 15 nt upstream and downstream of the splice site, respectively), generating a proximal (3′p) and a distal (3′d) 3′ splice site in relation to the upstream IVS 7 5′ splice site (see Fig. 1 A). The 844ins68 allele generates, however, a normal transcript, and it has been shown that there is no increase of the odds ratios for any disease, at least when it is not associated with other risk factors (5Aras O. Hanson N.Q. Yang F. Tsai M.Y. Clin. Genet. 2000; 58: 455-459Crossref PubMed Scopus (36) Google Scholar, 6de Franchis R. Fermo I. Mazzola G. Sebastio G., Di Minno G. Coppola A. Andria G. D'Angelo A. Thromb. Haemost. 2000; 84: 576-582Crossref PubMed Scopus (58) Google Scholar, 7Gaustadnes M. Rudiger N. Rasmussen K. Ingerslev J. Thromb. Haemost. 2000; 83: 554-558Crossref PubMed Scopus (46) Google Scholar, 8Tsai M.Y. Bignell M. Yang F. Welge B.G. Graham K.J. Hanson N.Q. Atherosclerosis. 2000; 149: 131-137Abstract Full Text Full Text PDF PubMed Scopus (135) Google Scholar, 9Orendac M. Muskova B. Richterova E. Zvarova J. Stefek M. Zaykova E. Kraus J.P. Stribrny J. Hyanek J. Kozich V. J. Inherit. Metab. Dis. 1999; 22: 674-675Crossref PubMed Scopus (16) Google Scholar, 10Franco R. Maffei F. Lourenco D. Piccinato C. Morelli V. Thomazini I. Zago M. Haematologica. 1998; 83: 1006-1008PubMed Google Scholar, 11Kluijtmans L.A. Boers G.H. Trijbels F.J. van Lith-Zanders H.M. van den Heuvel L.P. Blom H.J. Biochem. Mol. Med. 1997; 62: 23-25Crossref PubMed Scopus (65) Google Scholar). Because the proximal and the distal 3′ splice sites of intron 7 are identical, either the selection of the proximal 3′ splice site or at least the coexistence of both the proximal and distal splice site selection in variable ratios are to be expected. On the contrary, it has been shown that the distal 3′ splice site is exclusively selected, such that the 68-bp insertion is skipped in the mature mRNA derived from the 844ins68 allele (2Sperandeo M.P. de Franchis R. Andria G. Sebastio G. Am. J. Hum. Genet. 1996; 59: 1391-1393PubMed Google Scholar). Therefore, if the splicing pattern associated with the presence of the insertion is clear, the molecular mechanisms underlying the skipping of the 68-bp insertion are presently unknown. The included T833C mutation is also skipped, because of the absence of the 844ins68 insertion in the CBS mRNA. It is conceivable that the skipping of the insertion is likely to be driven by some sequences within the region of the insertion itself. In particular, we have noticed that the 844ins68 insertion creates a peculiar gene rearrangement at the 3′ splice site where two GGGG sequences, one at the beginning of exon 8 (6 nt downstream of the proximal 3′ splice site) and the other in the duplication (9 nt downstream of the insertion point), have been brought close to each other. Previous studies have shown that G runs are distributed throughout human introns (12Nussinov R. J. Biomol. Struct. Dyn. 1989; 6: 985-1000Crossref PubMed Scopus (21) Google Scholar) and represent an enhancer-like element that seems to be important for the splicing of constitutive introns in chicken tropomyosin (13Sirand-Pugnet P. Durosay P. Brody E. Marie J. Nucleic Acids Res. 1995; 23: 3501-3507Crossref PubMed Scopus (106) Google Scholar), human α-globin (14McCullough A.J. Berget S.M. Mol. Cell. Biol. 1997; 17: 4562-4571Crossref PubMed Scopus (185) Google Scholar), and human growth hormone (15McCarthy E.M. Phillips 3rd., J.A. Hum Mol Genet. 1998; 7: 1491-1496Crossref PubMed Scopus (90) Google Scholar). Moreover, G runs seem to be important for the regulation of splicing in virus (16Haut D.D. Pintel D.J. J. Virol. 1998; 72: 1834-1843Crossref PubMed Google Scholar, 17Caputi M. Zahler A.M. EMBO J. 2002; 21: 845-855Crossref PubMed Scopus (103) Google Scholar). We have, therefore, explored the possible effects of G runs on the regulation of the splicing of the CBS exon 8. The 3215-bp CBS genomic region of a subject heterozygous for the insertion 844ins68, spanning from exon 5 to exon 9, was amplified by PCR using primer 5′ CBS-Ex5/HindIII (5′-acgaagcttgtggacgtgctgcgggcactgg-3′) and 3′ CBS-Ex9/XhoI (5′-acgctcgagcgcacagcagcccctcttgcgcga-3′). The PCR product wasHindIII/XhoI cloned in pcDNA 3 expression vector (Invitrogen). Wild-type and 844ins68 clones were sequenced to confirm their identity and to exclude the presence of other mutations. The other mutant constructs were generated by PCR using WT and 844ins68 plasmids as templates. The DNA used for transfections was purified with JetStar columns (Genomed). Liposome-mediated transfections of 3 × 105 human hepatocarcinoma Hep3B cells were performed using DOTAP (Alexis). 5 μg of construct DNA was used for each transfection. After 12 h the medium was replaced with fresh medium, and 24 h later the cells were harvested. The RNA was extracted using RNAzolB solution (Biotex) and retrotranscribed with poly-dT primer. To amplify only the messenger derived from the transfections, PCR were carried out with CBS-Ex9/XhoI primer (annealing in the construct) and T7 primer (annealing in the vector). Each transfection experiment was repeated at least three times. 844ins68 and Gs-mutagenized 844ins68 CBS templates were synthesized by annealing the oligonucleotides CBS 844ins68 wt oligonucleotides S (5′-catcactggggtggatcatccaggtggggcttta-3′) and CBS 844ins68 wt oligonucleotide AS (5′-agcttaaagccccacctggatgatccaccccagtgatggtac-3′) as well as CBS 844 ΔG1-G2 oligonucleotide S (5′-catcactggtgtggatcatccaggtgcgtcttta-3′) and CBS 844 ΔG1-G2 oligonucleotide AS (5′-agcttaaagacgcacctggatgatccacaccagtgatggtac-3′), followed by direct cloning under the T7 promoter control into KpnI- andHindIII-digested pBluescript KS. G123 construct used for binding assays was generated by annealing the oligonucleotides 844ins68 G123 oligonucleotides Kpn/Hind S (5′-catcactggggtggatcatccaggtggggcttttgctggccttgagccctgaagccgcgccctctgcagatcattggggtggatcccga-3′) and 844ins68 G123 oligonucleotides Kpn/Hind AS (5′-agcttcgggatccaccccaatgatctgcagagggcgcggcttcagggctcaaggccagcaaaagccccacctggatgatccaccccagtgatggtac-3′). Mutant versions of the G123 construct (ΔG1G2, ΔG1, ΔG2, and ΔG3) were prepared by annealing oligonucleotides carrying mutagenized G1 (gggg → ggtg) and/or G2 (gggg → gcgt) and/or G3 (gggg → gcgt) runs. G1G2G3G4 and G3G4 constructs were generated by PCR using CBS 844ins68 and CBS 844 ΔG1-G2, respectively, as template with oligonucleotides CBS T7 ATC S (5′-tacgtaatacgactcactataggccgcgccctctgcagatcac-3′) and CBS G3G4 AS (5′-cgaggatggacccttcgggatccaccccaa-3′). A G1G2bis construct was also generated by PCR using CBS 844 ΔG1-G2 as template with CBS T7 ATC S and G1G2bis AS (5′-cagcaaaagccccacctggatgatccaccccaa-3′) oligonucleotides. Plasmids to be transcribed into RNA were first linearized byHindIII digestion. Preparative amounts of T7 polymerase transcripts were prepared from 25–50 μg of template in the presence of transcription buffer (350 mm HEPES, pH 7.5, 30 mm MgCl2, 2 mm spermidine, and 40 mm dithiothreitol), 40 units of RNasin, 7 mmeach of the four NTPs, and 60 units of T7 polymerase (1.5 units/μg). Following incubation for 2 h at 40 °C, the RNA was purified using NICK Columns (Amersham Biosciences). In vitro T7-transcribed 32P-labeled RNA (4–6 fmol) was incubated with 25 mm Tris-HCl, pH 7.5, 5 mmMgCl2, 150 mm KCl, 10 μg/ml heparin, and 30 μg of nuclear extract in a final volume of 20 μl for 15 min on ice. Following the addition of 5 μl of 50% (v/v) glycerol (and tracking dye to only the free RNA samples), complexes were resolved on a 4% polyacrylamide gel (ratio of 19:1 acrylamide:bis) in 75 mmm Tris-glycine buffer (75 mm Tris, 75 mm glycine) at 15–25 mA for 2–4 h at 4 °C. Gels were dried and exposures were made to X-Omat AR films for 1–6 h. For UV-cross-linking assays, all plasmids were linearized by digestion with an appropriate restriction enzyme. Transcription of cold and labeled RNA was carried out as described for band shift analysis.32P-Labeled RNA was incubated with 60 μg of HeLa nuclear extract as earlier described above for band shift assays. Following incubation, the samples were exposed to UV light (254 nm, 100 watts) at a distance of 5 cm for 10 min on ice. Then RNA was removed by digestion with 1 unit of RNase A at 37 °C for 30 min. The resulting32P-labeled proteins were resolved by 10% SDS-PAGE in the presence of molecular weight markers. The gel was dried and exposures were made to X-Omat AR film for 12–24 h. For the production of RNA template used for binding experiments, wild-type and G mutant 844ins68 CBS oligonucleotides were cloned in pBluescript KS KpnI/HindIII under the T7 promoter control. The CBS 844ins68 construct was obtained by cloning a double-strand oligonucleotide, which was generated by annealing 1 μg of CBS 844ins68 wt oligonucleotides S (5′-catcactggggtggatcatccaggtggggcttta-3′) and CBS 844ins68 wt oligonucleotide AS (5′-agcttaaagccccacctggatgatccaccccagtgatggtac-3′). The CBS 844 ΔG1-G2 construct was obtained by cloning a double-strand oligonucleotide, which was generated by annealing 1 μg of CBS 844ins68 ΔGs oligonucleotide S (5′-catcactggtgtggatcatccaggtgcgtcttta-3′) and CBS 844ins68 ΔGs oligonucleotide AS (5′-agcttaaagacgcacctggatgatccacaccagtgatggtac-3′). One nanomole (∼7.9 μg) of cold CBS 844ins68 and CBS 844 ΔG1-G2 RNAs was placed in a 400-μl reaction mixture containing fresh 0.1m NaOAc, pH 5.0, and 5 mm sodiumm-periodate (Sigma). Reaction mixtures were incubated for 1 h in the dark at room temperature. The RNA was ethanol-precipitated and resuspended in 500 μl of 0.1 mNaOAc, pH 5.0. Then, 400 μl of adipic acid dehydrazide-agarose bead 50% slurry (Sigma) was washed four times in 10 ml of 0.1 mNaOAc, pH 5.0, and pelleted after each wash at 3000 rpm for 3 min in a Eppendorf minifuge. After the final wash, 300 μl of 0.1m NaOAc, pH 5.0, was added to the beads. The slurry was then divided into two aliquots that were mixed with each periodate-treated RNA sample and incubated for 12 h at 4 °C on a rotator. The beads with the bound RNA were then pelleted and washed three times in 2 ml of 2 m NaCl and three times in 3 ml of RNA wash buffer (52 mm HEPES-KOH, pH 7.5, 10 mmMgCl2, 8 mm Mg(acetate)2, 5.2 mm dithiothreitol, 38% v/v glycerol). They were incubated in 1× RNA binding buffer (i.e. RNA wash buffer added with 7.5 mm ATP, 10 mm GTP, and 5 mg/ml heparin) with 0.3 mg of HeLa cell nuclear extract for 20 min at 30 °C in a 500-μl final volume, pelleted by centrifugation at 1000 rpm for 3 min, and washed five times with 5 ml of RNA wash buffer. After the final centrifugation 60 μl of SDS-PAGE sample buffer were added to the beads and heated for 5 min at 90 °C before loading onto a 10% SDS-PAGE gel. Internal sequence analysis from the Coomassie Blue-stained bands excised from the SDS-PAGE gel was performed using an electrospray ionization mass spectrometer (LCQ DECA XP, ThermoFinnigam). The bands were digested by trypsin, and the resulting peptides were extracted with water and 60% acetonitrile/1% trifluoroacetic acid. The fragments were then analyzed by mass spectrometry, and the proteins were identified by analysis of the peptide MS/MS data with Turbo SEQUEST (ThermoFinnigam) and MASCOT (Matrix Science). The full coding sequence of hnRNP H1 was amplified by RT-PCR from HeLa total RNA using NdeI-S 5′-acgcatatgatgttgggcacggaaggt-3′ and BamHI-AS 5′-gatggatcctcaatgatggtgatgatggtggtggtgatgacctgcaatgtttgattgaaaatcact-3′, and then it was NdeI/BamHI-cloned in pET11a expression vector. Recombinant hnRNP H1 was expressed inEscherichia coli BL21DE3 strain and purified after migration in SDS-PAGE. The purified protein was used to produce polyclonal antiserum by immunizing a 3-month-old rabbit (New Zealand strain) according to standard protocols. Western blots were carried out according to standard protocol using 1/100 dilution of pre-immune or immune antiserum. To investigate the molecular basis of the CBS 844ins68 splicing, we set up a minigene system with generation of two basic constructs where the genomic regions spanning from CBS exons 5 to 9 with or without the 844ins68 insertion were cloned according to the original reading frame in the pcDNA3 eukaryotic expression vector. The analysis of the splicing pattern by RT-PCR of these constructs after transient transfections in the Hep3B cell line showed that both the wild-type and 844ins68 constructs reproduce the splicing pattern found in vivo, with activation of the distal 3′ splice site (Fig. 1 B). The 844ins68 of the CBS gene is particularly intriguing because it shows a natural duplication of the same 3′ splice site, similar to the non-natural models where two identical 5′ or 3′ splice sites were duplicated ad hoc. We considered the possibility that the skipping of the insertion might be driven by some sequences within the region of the insertion itself. In particular, we focused our attention on the two GGGG sequences, one at the beginning of exon 8 (6 nt downstream of the proximal 3′ splice site) and the other in the duplication (9 nt downstream of the insertion point), which have been brought close to each other by the insertional event. To study the possible effect of G runs on splicing in the CBS gene, we designed a series of mutants of the basic constructs where the first (ΔG1), the second (ΔG2), or both G (ΔG1-G2) runs were disrupted by point mutations (Fig. 2 A). These constructs were transiently transfected in the Hep3B cell line, and the splicing pattern was analyzed by RT-PCR using primers specific for the constructs. The identity of each PCR product in all transfection experiments was confirmed by sequencing. When the G1 and G2 runs were mutagenized in the 844ins68 construct, there was a shift in the selection of the 3′ splice site (Fig. 2 B, lanes 3–5). In fact, an activation of the proximal 3′ splice site was observed after mutation of both G1 and G2 strings or of the G1 string, leading to the retention of the 844ins68 insertion within the CBS mRNA (Fig. 2 B, lanes 4 and 5). The mutation of the G2 string alone also resulted in the activation of the proximal 3′ splice site, although a partial use of distal 3′ splice site was also observed (Fig. 2 B, lane 3). It is apparent that, in the 844ins68 context, both G1 and G2 runs are necessary for the activation of the distal 3′ splice site. The G1 and G2 runs are of course present in the wild-type allele, although they are in a different configuration (Fig.3 A). To determine whether they affect intron 7 splicing in the wild-type context, or if their effect on splicing is specifically due to the peculiar rearrangement created by the insertion, two other mutants were generated from the wild-type construct where each G1 and G2 run in its original position was mutagenized (Fig. 3 A, wild-type ΔG IVS7 and wild-type ΔG exon 8). A mutant of the 844ins68 was also designed in which the G run within exon 8 was mutagenized (Fig. 3 A, 844ins68 ΔG3). The transfections of all these mutants showed the full use of the IVS7 distal 3′ splice site, as observed using the wild-type construct (Fig.3 B, lanes 3–5). These results indicate that single G runs in their original location are not able to modulate IVS7 splicing. Furthermore, the T833C mutation alone was shown not to affect the wild-type splicing pattern (data not shown). To identify the trans-acting factors able to bind the G runs in the 844ins68 context, EMSA and UV-cross-linking assays were performed with in vitro transcribed RNAs containing either the core region of the CBS 844ins68 duplication (Fig.4 A, CBS 844ins68 construct) or the same sequence carrying point mutations within the G runs (Fig.4 A, CBS 844 ΔG1-G2 construct). Band shift experiments with the CBS 844ins68 construct in the presence of the nonspecific competitor heparin showed a broad band of shifted material (Fig. 4 B, lane 2), whereas no shift was observed with the CBS 844 ΔG1-G2 construct (Fig. 4 B,lane 1). Therefore, a specific complex is formed in presence of G strings in the CBS 844ins68 context. The nature of factors and/or protein(s) binding to G strings was investigated by a number of methods. Initially, because it was reported that U1snRNA binds to G triplets in the α-globin system (18McCullough A.J. Berget S.M. Mol. Cell. Biol. 2000; 20: 9225-9235Crossref PubMed Scopus (73) Google Scholar), supershift analysis with anti-U1 antibody was carried out. Labeled CBS 844ins68 and CBS 844 ΔG1-G2 RNAs were incubated with HeLa nuclear extracts in the presence of an anti-U1A rabbit polyclonal antibody or a control rabbit polyclonal sera. Neither supershift nor complex disruption was observed in the CBS context (Fig. 4 A,lanes 3 and 4), whereas supershift was observed using a control template that included the consensus sequence for U1 (Fig. 4 C). Subsequently, the size and number of protein(s) binding were addressed by UV cross-linking of RNA-protein complexes. 32P-Labeled CBS 844ins68 or CBS 844 ΔG1-G2 RNA were cross-linked to protein by exposure to UV light, and the resulting 32P-labeled protein(s) were separated by SDS-PAGE. A prominent band of approximately 58-kDa molecular mass could only be observed with the CBS 844ins68 construct containing the G runs (Fig.5 A). The specificity of binding was further tested by performing competition assays. As expected, the bands were readily competed away from labeled CBS 844ins68 sequence by the addition of increasing amounts of cold CBS 844ins68 but not by the addition of increasing amounts of cold CBS 844 ΔG1-G2 (Fig. 5 B). The identity of proteins that bind specifically to the G runs in CBS 844ins68 RNA context was investigated through an affinity purification procedure that involves the cross-linking of transcribed CBS 844ins68 RNA to adipic acid dehydrazide-agarose beads. As a control, we used CBS 844ins68 ΔG1-G2-transcribed RNA cross-linked to agarose beads. Both beads preparations were separately incubated with HeLa nuclear extracts, and the proteins bound were separated on an SDS-PAGE gel and then analyzed by Coomassie Blue staining. Fig.6 A shows a clear difference between the binding patterns of the CBS 844ins68 and the CBS 844ins68 ΔG1-G2-derivatized beads. The CBS 844ins68 RNA specifically pulled down the 58-kDa protein, whose molecular mass was similar to that of the band observed in UV cross-linking assays. Moreover, a less abundant protein of apparent 53-kDa molecular mass was also pulled down. Internal sequencing by mass spectrometry of the main 58-kDa band resulted in seven peptides whose sequence correspond to residues 82–87, 89–114, 151–167, 180–185, 193–199, and 300–316 of hnRNP H1. The less abundant protein was also sequenced and was found to correspond to hnRNP F. Immunoblots with polyclonal antibodies raised against recombinant hnRNP H1, then, confirmed the presence of this protein in the complex assembled onto CBS 844ins68 RNA and its absence from the CBS 844 ΔG1-G2 (Fig. 6 B). Binding specificity of hnRNP H1 has been described in different contexts (19Caputi M. Zahler A.M. J. Biol. Chem. 2001; 276: 43850-43859Abstract Full Text Full Text PDF PubMed Scopus (161) Google Scholar). To map the essential sequences required for interaction in the 844ins68 context, a series of mutants were prepared where the nucleotides proximal to both G runs were mutagenized, whereas the G runs were intact (CBS 844ins68B, Fig.7 A). The G1 run sequence (GGGGT) was mutated to (GGGGC), the G2 run sequence (TGGGG) was mutated to (AGGGG), and two single mutants and one double mutant were created. Binding to hnRNP H1 was observed with all mutants even if the intensity was slightly lower than that of the 844ins68 construct. These results suggest that the boundary nucleotides are not essential for binding but do contribute to increase the efficiency of such an interaction (Fig.7 B). A second set of UV cross-linking experiments was performed to test if the binding of hnRNP H1 to G runs was possible in the presence of a single Gs run or if tandem Gs repeats have to be present for optimal hnRNP H1 binding. Initially, a larger region of the 844ins68 duplication, including the G run within exon 8 (namely G3 run, Fig.8 A), was synthesized by oligonucleotide annealing and cloning in pBS KS under T7 promoter control (Fig. 8 A, G123). Three mutant versions of this construct were then generated in which either the G1 (ΔG1), the G2 (ΔG2), or the G3 (ΔG3) runs were mutagenized (Fig. 8 A). Fig. 8 B shows that the binding of hnRNP H1 to G runs can be observed only when G1-G2-G3 or at least G1-G2 runs were present. In fact, when G1 and/or G2 runs were mutagenized no significant binding was observed (Fig. 8 B, lane 2). Therefore, the presence of just one G run, and in particular G3, is not sufficient to promote hnRNPH1 binding in the CBS 844ins68 context. In addition, strong hnRNP H1 binding requires the presence of both G1 and G2 simultaneously, whereas the presence of two non-contiguous G runs (i.e. G1+G3 or G2+G3) is not sufficient for efficient hnRNP H1 binding (Fig. 8 B, lanes 2–4). Further indications of the importance of G1+G2 runs for protein interaction were obtained by comparison of the hnRNP H1 binding ability to a construct whose sequence includes the G4 run located 11 nt downstream of the G3 run (G3G4 construct, Fig.9 A) with that of a construct where the G1-G2 run context was reconstituted within exon 8 of the 844ins68 construct (G1G2bis construct, Fig. 9 A). The G3G4 construct is different from the G1G2bis construct for three insertions: two substitutions and an additional G. Fig. 9 B shows that G1G2bis but not G3G4 was capable of binding hnRNP H1, indicating that hnRNP H1 does not bind normally to the G3-G4 context within exon 8. The relative contributions of duplications to genetic pathology depend on factors such as the size of the duplicated region, the location of the duplication, incidence and severity of the effects of lesions, and selective factors (20Mazzarella R. Schlessinger D. Genome Res. 1998; 8: 1007-1021Crossref PubMed Scopus (83) Google Scholar). In this context, the identification of the 844ins68 insertion within the 3′ splice site of CBS IVS7 together with a second mutation (G913A) on the same allele in a patient affected by homocystinuria due to CBS deficiency represents an intriguing example of a duplication polymorphism associated with a variable disease susceptibility. The insertion 844ins68 is both structurally and functionally peculiar, because it consists of a duplication of the same 3′ splice site. In principle, the identity of two adjacent 3′ splice sites within the CBS intron 7 should at least determine the coexistence of two alternatively spliced CBS mRNA isoforms. Instead, the distal 3′ splice site is selected in 100% of transcripts determining the skipping of the 68-bp insertion in vivo. This observation prompted us to investigate the molecular mechanisms underlying this phenomenon. Our first approach was focused on the analysis of the 844ins68 sequence for possible cis-acting elements able of influencing 3′ splice site selection. In particular, we noticed two GGGG sequences, one at the beginning of exon 8 (6 nt downstream of the proximal 3′ splice site) and the other within the duplication (9 nt downstream of the insertion point), that have been brought close to each other by the insertional event. In comparison with other gene models where G runs are distributed throughout the entire intron or are located downstream the 5′ splice site (13Sirand-Pugnet P. Durosay P. Brody E. Marie J. Nucleic Acids Res. 1995; 23: 3501-3507Crossref PubMed Scopus (106) Google Scholar, 14McCullough A.J. Berget S.M. Mol. Cell. Biol. 1997; 17: 4562-4571Crossref PubMed Scopus (185) Google Scholar, 15McCarthy E.M. Phillips 3rd., J.A. Hum Mol Genet. 1998; 7: 1491-1496Crossref PubMed Scopus (90) Google Scholar, 21Carlo T. Sterner D.A. Berget S.M. RNA (N. Y.). 1996; 2: 342-353PubMed Google Scholar), the peculiarity of the Gs repeats in the 844ins68 context arises from the fact that they are located between two identical 3′ splice sites. Hence, the effects of G runs in the 844ins68 context might directly concern the 3′ splice site selection. Direct evidence of this fact is given by the observation that, in the 844ins68 context, the mutation of single G runs rescues the partial (ΔG2) or full (ΔG1) selection of the proximal 3′ splice site. On the other hand, the disruption of the single G runs in their original position and in particular, mutation of the G run within exon 8 in the wild-type context (wild-type ΔG exon 8 construct) did not alter the 3′ splice site selection and consequently had no effect on the wild-type splicing pattern (Fig. 3). Thus, our results show that G runs within the 844ins68 insertion are able to influence 3′ splice site selection in that context. These results are further supported by previous studies suggesting that the selection of splice sites can be affected by groups of G runs and not by single G runs (15McCarthy E.M. Phillips 3rd., J.A. Hum Mol Genet. 1998; 7: 1491-1496Crossref PubMed Scopus (90) Google Scholar). Thus, our data highlight the relevance of the insertional event for the creation of a splicing regulatory element. The investigation on the nature of the one or more factors that bind to G runs in the 844ins68 context revealed that an RNA probe spanning the 844ins68 duplication, including intact G runs formed a specific complex that can be visualized in an EMSA assay. A previous report has shown that G runs bind U1snRNA, in the α-globin model (18McCullough A.J. Berget S.M. Mol. Cell. Biol. 2000; 20: 9225-9235Crossref PubMed Scopus (73) Google Scholar). However, using a different methodology that did not involve RNA-RNA cross-linking, we were unable to detect U1snRNA in the complex shifted upon binding to 844ins68 RNA context. Subsequent UV cross-linking experiments highlighted how G runs in the 844ins68 context form a 58-kDa complex. Two proteins able to interact with G runs were isolated through an RNA affinity purification method, and mass spectrometry microsequencing identified the main 58-kDa protein as hnRNP H1 and the minor 53-kDa protein as hnRNP F. The copurification by RNA affinity of hnRNP F together with hnRNP H1 is not surprising, because it is known that both hnRNP H and hnRNP F bind to poly(G) (22Matunis M.J. Xing J. Dreyfuss G. Nucleic Acids Res. 1994; 22: 1059-1067Crossref PubMed Scopus (133) Google Scholar) and have been implicated in binding to an intronic splicing enhancer downstream of the 5′ splice site of c-src N1 exon (23Chou M.Y. Rooke N. Turck C.W. Black D.L. Mol. Cell. Biol. 1999; 19: 69-77Crossref PubMed Scopus (217) Google Scholar, 24Min H. Chan R.C. Black D.L. Genes Dev. 1995; 9: 2659-2671Crossref PubMed Scopus (172) Google Scholar). In addition, hnRNP H1 might interact with hnRNP F as already shown in the c-src N1 model and could work as a heterodimer (23Chou M.Y. Rooke N. Turck C.W. Black D.L. Mol. Cell. Biol. 1999; 19: 69-77Crossref PubMed Scopus (217) Google Scholar, 24Min H. Chan R.C. Black D.L. Genes Dev. 1995; 9: 2659-2671Crossref PubMed Scopus (172) Google Scholar). On the other hand, hnRNP H has been reported to bind an exonic splicing silencer located within exon 7 of the rat β-tropomyosin gene (25Chen C.D. Kobayashi R. Helfman D.M. Genes Dev. 1999; 13: 593-606Crossref PubMed Scopus (171) Google Scholar) and participates in the exclusion of this exon in non-muscle cells. Therefore, the presence of hnRNP H1 in the complex affinity-purified with 844ins68 RNA is consistent with the involvement of this protein in the definition of the duplicated region between the proximal and the distal 3′ splice site as an intron. Further definition of 844ins68 RNA binding specificity has shown that nucleotides immediately downstream of G1 (G4T → C) and upstream of G2 (A ← TG4) runs are not critical for protein binding and allow interactions with hnRNP H1 although possibly with lower affinity. Binding with other members of the hnRNP H family was not observed in the CBS 844ins68 context, in comparison with other models studied in a previous report (19Caputi M. Zahler A.M. J. Biol. Chem. 2001; 276: 43850-43859Abstract Full Text Full Text PDF PubMed Scopus (161) Google Scholar). On the other hand, the investigation of binding requirements has shown that hnRNP H1 mainly binds to the G1 and G2 runs only if they are present simultaneously. In fact, only constructs G123 and ΔG3 bind tightly hnRNP H1, whereas ΔG1G2, ΔG1, and ΔG2 constructs do not. This observation further supports the indication obtained from transfection experiments that the configuration of two G repeats brought close to each other by the gene rearrangement is responsible of the selection of the 3′ splice site of intron 7 in the 844ins68 context. An additional consideration regarding the binding requirements is that the distance between the adjacent G runs seems to be crucial. In fact, CBS ΔG3 construct binds hnRNP H1, whereas CBS ΔG2 does not. Considering that G1 and G2 are spaced by 14 nt, whereas G2 and G3 are separated by 46 nt, the gap between two contiguous G runs seems to be crucial for a stable interaction of hnRNP H1 with the target sequence. Further support to this hypothesis is provided by the rescue of hnRNP H1 binding obtained with construct G1G2bis where G1 and G2 runs were reconstituted within exon 8, at the 3′-end of construct ΔG1G2. Thus, different mechanisms can be proposed to account for the effects hnRNP H1 on the 3′ splice site selection in the 844ins68 context. It is possible that the interaction of hnRNP H1 with G1-G2 runs might hamper the access of constitutive splicing factors to the proximal 3′ splice site by sterical hindrance or by looping it out. Alternatively, hnRNP H1 may interact with other splicing factors so enhancing the selection of the distal 3′ splice site. In this way, the G elements brought close to each other might virtually extend the intronic region until the end of the duplication. In conclusion, our findings suggest the physiological relevance of insertional events to avoid a disease phenotype. In fact, it is apparent that the two G runs brought close to each other by the 844ins68 insertion work cooperatively in a CBS allele carrying the disease-causing mutation. This rearrangement, probably derived from an unequal crossing-over between the T833C and the wild-type alleles, may have been subjected to positive evolutionary selection, because it allowed the rescue of the wild-type sequence. Therefore, we propose that the duplication of the IVS7 3′ splice site DNA 68-bp segment carrying the T833C mutation generated an intron splicing regulatory element composed by the G1-G2 region that promotes the constitutive use of the distal 3′ splice site. This event allowed by-pass of the T833C substitution as well as of the whole 68-bp insertion and its associated premature stop codons, thus preserving the wild-type open reading frame and rescuing the protein function. To our knowledge, this is the best-characterized evolutionary event in which a small duplication results in a mutation correction through alteration of splicing regulation. We thank Ann Crum for critical reading of manuscript and I. W. Mattaj and A. Segref for generous gift of the anti-U1A antibody.

Referência(s)