Artigo Acesso aberto Revisado por pares

Human Genomic Deletions Mediated by Recombination between Alu Elements

2006; Elsevier BV; Volume: 79; Issue: 1 Linguagem: Inglês

10.1086/504600

ISSN

1537-6605

Autores

Shurjo K. Sen, Kyudong Han, Jianxin Wang, Jungnam Lee, Hui Wang, Pauline A. Callinan, Matthew D. Dyer, Richard Cordaux, Ping Liang, Mark A. Batzer,

Tópico(s)

Genomics and Chromatin Dynamics

Resumo

Recombination between Alu elements results in genomic deletions associated with many human genetic disorders. Here, we compare the reference human and chimpanzee genomes to determine the magnitude of this recombination process in the human lineage since the human-chimpanzee divergence ∼6 million years ago. Combining computational data mining and wet-bench experimental verification, we identified 492 human-specific deletions (for a total of ∼400 kb) attributable to this process, a significant component of the insertion/deletion spectrum of the human genome. The majority of the deletions (295 of 492) coincide with known or predicted genes (including 3 that deleted functional exons, as compared with orthologous chimpanzee genes), which implicates this process in creating a substantial portion of the genomic differences between humans and chimpanzees. Overall, we found that Alu recombination-mediated genomic deletion has had a much higher impact than was inferred from previously identified isolated events and that it continues to contribute to the dynamic nature of the human genome. Recombination between Alu elements results in genomic deletions associated with many human genetic disorders. Here, we compare the reference human and chimpanzee genomes to determine the magnitude of this recombination process in the human lineage since the human-chimpanzee divergence ∼6 million years ago. Combining computational data mining and wet-bench experimental verification, we identified 492 human-specific deletions (for a total of ∼400 kb) attributable to this process, a significant component of the insertion/deletion spectrum of the human genome. The majority of the deletions (295 of 492) coincide with known or predicted genes (including 3 that deleted functional exons, as compared with orthologous chimpanzee genes), which implicates this process in creating a substantial portion of the genomic differences between humans and chimpanzees. Overall, we found that Alu recombination-mediated genomic deletion has had a much higher impact than was inferred from previously identified isolated events and that it continues to contribute to the dynamic nature of the human genome. With a copy number of >1 million, Alu elements are one of the most successful non-LTR (long terminal repeat) retrotransposon families in the human genome.1Lander ES Linton LM Birren B Nusbaum C Zody MC Baldwin J Devon K et al.Initial sequencing and analysis of the human genome.Nature. 2001; 409: 860-921Crossref PubMed Scopus (17710) Google Scholar In addition to classic retrotransposition-associated insertion mutations, Alu elements can create genomic instability by the deletion of host DNA sequences during their integration into the genome and by creating genomic deletions associated with intrachromosomal and interchromosomal recombination events.2Callinan PA Wang J Herke SW Garber RK Liang P Batzer MA Alu retrotransposition-mediated deletion.J Mol Biol. 2005; 348: 791-800Crossref PubMed Scopus (89) Google Scholar, 3Deininger PL Batzer MA Alu repeats and human disease.Mol Genet Metab. 1999; 67: 183-193Crossref PubMed Scopus (697) Google Scholar Multiple features predispose Alu elements to successful recombination, including their proximity in the genome (one insertion every 3 kb, on average), the high GC content of their sequence (∼62.7%), and the remarkable sequence similarity (70%–100%) among Alu subfamilies of widely different ages. Overall, the recombinogenic nature of these elements is reflected in the various forms of cancer and genetic disorders associated with Alu-mediated recombination events.3Deininger PL Batzer MA Alu repeats and human disease.Mol Genet Metab. 1999; 67: 183-193Crossref PubMed Scopus (697) Google Scholar, 4Hattori Y Okayama N Ohba Y Yamashiro Y Yamamoto K Tsukimoto I Kohakura M The precise breakpoints of a Filipino-type alpha-thalassemia-1 deletion found in two Japanese.Hemoglobin. 1999; 23: 239-248Crossref PubMed Scopus (7) Google Scholar, 5Huang LS Ripps ME Korman SH Deckelbaum RJ Breslow JL Hypobetalipoproteinemia due to an apolipoprotein B gene exon 21 deletion derived by Alu-Alu recombination.J Biol Chem. 1989; 264: 11394-11400Abstract Full Text PDF PubMed Google Scholar, 6Batzer MA Deininger PL Alu repeats and human genomic diversity.Nat Rev Genet. 2002; 3: 370-379Crossref PubMed Scopus (1048) Google Scholar, 7Levran O Doggett NA Auerbach AD Identification of Alu-mediated deletions in the Fanconi anemia gene FAA..Hum Mutat. 1998; 12: 145-152Crossref PubMed Scopus (42) Google Scholar, 8Marshall B Isidro G Boavida MG Insertion of a short Alu sequence into the hMSH2 gene following a double cross over next to sequences with chi homology.Gene. 1996; 174: 175-179Crossref PubMed Scopus (19) Google Scholar, 9Myerowitz R Hogikyan ND A deletion involving Alu sequences in the β-hexosaminidase α-chain gene of French Canadians with Tay-Sachs disease.J Biol Chem. 1987; 262: 15396-15399Abstract Full Text PDF PubMed Google Scholar, 10Rohlfs EM Puget N Graham ML Weber BL Garber JE Skrzynia C Halperin JL Lenoir GM Silverman LM Mazoyer S An Alu-mediated 7.1 kb deletion of BRCA1 exons 8 and 9 in breast and ovarian cancer families that results in alternative splicing of exon 10.Genes Chromosomes Cancer. 2000; 28: 300-307Crossref PubMed Scopus (71) Google Scholar, 11Rothberg PG Ponnuru S Baker D Bradley JF Freeman AI Cibis GW Harris DJ Heruth DP A deletion polymorphism due to Alu-Alu recombination in intron 2 of the retinoblastoma gene: association with human gliomas.Mol Carcinog. 1997; 19: 69-73Crossref PubMed Scopus (29) Google Scholar, 12Tvrdik T Marcus S Hou SM Falt S Noori P Podlutskaja N Hanefeld F Stromme P Lambert B Molecular characterization of two deletion events involving Alu-sequences, one novel base substitution and two tentative hotspot mutations in the hypoxanthine phosphoribosyltransferase (HPRT) gene in five patients with Lesch-Nyhan syndrome.Hum Genet. 1998; 103: 311-318Crossref PubMed Scopus (27) Google Scholar However, clinical studies of isolated disease-causing deletions, although useful from a medical viewpoint and in demonstrating the existence of Alurecombination-mediated deletions (ARMDs), do not adequately depict the overall contribution of this process to the architecture of the genome and the associated impact on gene function. The availability of a genome sequence for the common chimpanzee (Pan troglodytes), the closest evolutionary relative of the human lineage,13Chimpanzee Sequencing and Analysis ConsortiumInitial sequence of the chimpanzee genome and comparison with the human genome.Nature. 2005; 437: 69-87Crossref PubMed Scopus (1743) Google Scholar has allowed us to perform a comparative genomic assessment of the extent of ARMD in the human genome over the past ∼6 million years, since the divergence of the human and chimpanzee lineages.14Miyamoto MM Slightom JL Goodman M Phylogenetic relations of humans and African apes from DNA sequences in the psi eta-globin region.Science. 1987; 238: 369-373Crossref PubMed Scopus (185) Google Scholar, 15Wildman DE Uddin M Liu G Grossman LI Goodman M Implications of natural selection in shaping 99.4% nonsynonymous DNA identity between humans and chimpanzees: enlarging genus Homo..Proc Natl Acad Sci USA. 2003; 100: 7181-7188Crossref PubMed Scopus (217) Google Scholar In this study, we identified ∼400 kb of human-specific ARMD, the distribution of which is biased toward gene-dense regions of the genome, which raises the possibility that ARMD may have played a role in the divergence of humans and chimpanzees. About 60% of the ARMDs are located in genes, and, in at least three instances, exons have been deleted in human genes relative to their chimpanzee orthologs. The nature of the altered genes suggests that ARMD might have played a role in shaping the unique traits of the human and chimpanzee lineages. Mechanistically, we characterized the physical aspects of the deletion process and proposed different models for ARMD. We extracted 400 bp of 5′ and 3′ genomic sequence flanking all human Alu elements (fig. 1). Next, we joined the two 400-bp stretches to form a single sequence (the “query”). For each query, the best match in the reference chimpanzee genome (PanTro1 [November 2003 freeze]) was identified. Then, the sequence stretch in the chimpanzee genome between the two regions that aligned with the two 400-bp halves of the query (the “hit”) was extracted and aligned with the human Alu sequence initially used to design the query (the “query Alu”), by use of a local installation of the National Center for Biotechnology Information Blast 2 Sequences Bl2seq utility. Following are the possible alignment results for each sequence pair (see corresponding diagrams in fig. 1). A. There is no match. In this case, an Alu insertion-mediated deletion has occurred in the human genome at that locus.B. There is only one alignment block, and:B.1. The hit is identical to the query Alu. This is shared ancestry of an Alu insertion.B.2. The hit is longer than the query Alu, and the extra sequence is entirely composed of a poly(A) tract downstream of the Alu sequence. This is a case of extension of the Alu poly(A) tail.B.3. The hit consists of the query Alu plus some extra non-poly(A) sequence, andB.3a. The extra, non-poly(A) sequence is downstream of the poly(A) tail. This could be a gene conversion event in the chimpanzee genome.B.3b. The extra, non-poly(A) sequence is upstream of the query Alu element or there is extra sequence at both ends. This is a possible Alu insertion–mediated deletion event in the human genome.C. There is more than one alignment block, andC.1. The beginning and end of the hit match the query Alu and the hit is at least 100 bp longer than the query Alu sequence (since this size would approximate the expected lower ARMD size limit). This is a candidate ARMD event in the human genome.C.2. At least one end of the hit has no match to the query Alu. This is another possible case for an Alu insertion–mediated deletion in the human genome. We retained all loci matching case C.1 as pairs of FASTA files (i.e., the orthologous human and chimpanzee sequences). Each human sequence contained the query Alu and its 400-bp flanking sequences on each side, and each chimpanzee sequence contained the entire hit that aligned with the query flanking sequences. All candidate ARMD loci were then manually inspected and, if necessary, verified by wet-bench (PCR) analysis. Orthologous human and chimpanzee sequences for each locus are available from the “Publications” section of the Batzer Laboratory Web site. A typical Alu insertion is flanked on both sides by identical (or nearly perfect) short, direct repeats (7–20 bp) termed “target-site duplications” (TSDs).16Deininger PL Batzer MA Mammalian retroelements.Genome Res. 2002; 12: 1455-1465Crossref PubMed Scopus (303) Google Scholar The single Alu element remaining at a human candidate ARMD locus is characterized by the apparent absence of TSDs, since it is composed of fragments from a pair of Alu elements with mutually different TSDs, situated at the orthologous ancestral locus (which persists in the chimpanzee genome). This hallmark of the ARMD process offers a direct means of confirming the “chimeric” origin of the human Alu element at a deletion locus. Using this property as our basis for verification, we manually inspected all candidate loci returned by the computational analysis. In an unambiguous ARMD event, the TSDs of the two Alu elements immediately upstream and downstream of the deleted portion in the chimpanzee genome were perfect matches with the 5′ and 3′ TSDs, respectively, of the orthologous single human Alu element. In the next possible scenario, the sequence on any one side of the human Alu element (upstream or downstream) matched the TSDs of the chimpanzee element on the corresponding side, but the other chimpanzee Alu element itself lacked TSDs. However, the sequence immediately flanking this element on the side opposite to the deletion was identical in both human and chimpanzee. In both these cases, we accepted the computational detection as a valid ARMD locus. At loci that showed slight deviations in the sequence architecture from the unambiguous ARMD structures described above (which raise the possibility that one of the two chimpanzee Alu elements might be a chimpanzee-specific Alu insertion, as opposed to a human-specific ARMD event), we designed oligonucleotide primers in the nonrepetitive sequences flanking the Alu elements in the chimpanzee genome, and we experimentally confirmed by PCR (and, where required, by DNA sequencing) that the deletion did exist and was specific to the human genome. As an additional step to verify the potential ARMD loci that we accepted/rejected solely on the basis of computational identification, we randomly chose two sets of 25 such insertions and deletions and verified them by PCR. Accuracy rates for putative deletion and insertion loci were 100% and 96%, respectively (4% of putative insertions comprising the error were all deletions), which confirmed the validity of our approach. We designed oligonucleotide primers using Primer3 software. Detailed information for each locus—including primer sequences, annealing temperature, and PCR product sizes—is available from the “Publications” section of the Batzer Laboratory Web site. PCR amplification of each locus was performed in 25-μl reactions with 10–50 ng genomic DNA, 200 nM of each oligonucleotide primer, 200 μM dNTPs in 50 mM KCl, 1.5 mM MgCl2, 10 mM Tris-HCl (pH 8.4), and 2.5 units Taq DNA polymerase. The conditions for the PCR were an initial denaturation step of 94°C for 4 min; followed by 32 cycles of 1 min of denaturation at 94°C, 1 min of annealing at optimal annealing temperature, and 1 min of extension at 72°C; followed by a final extension step at 72°C for 10 min. PCR amplicons were separated on 2% agarose gels, were stained with ethidium bromide, and were visualized using UV fluorescence. Individual PCR products were purified from the gels with Wizard gel purification kits (Promega) and were cloned into vectors by use of TOPO-TA Cloning kits (Invitrogen). For each sample, three colonies were randomly selected and were sequenced on an Applied Biosystems ABI3130XL automated DNA sequencer. Each clone was sequenced in both directions with use of M13 forward and reverse primers. The sequence tracks were analyzed using the Seqman program in the DNASTAR suite and were aligned using BioEdit sequence alignment software. Gorilla and orangutan sequences generated during the course of this study have been submitted to GenBank under accession numbers DQ363502–363524. Loci verified by PCR were screened on a panel of five primate species, including Homo sapiens (HeLa; cell line ATCC CCL-2), P. troglodytes (common chimpanzee; cell line AG06939B), Pan paniscus (bonobo or pygmy chimpanzee; cell line AG05253B), Gorilla gorilla (western lowland gorilla; cell line AG05251), and Pongo pygmaeus (orangutan; cell line ATCC CR6301). To evaluate polymorphism rates, we amplified 50 randomly picked ARMD loci on a panel of genomic DNA, from 80 human individuals (20 from each of four populations: African American, South American, European, and Asian), that was available from previous studies in our lab. To test whether the GC and Alu contents of the sequences deleted through ARMD differed statistically from the rest of the genome, we performed Monte Carlo simulations comparing the observed deletions to two other sets of sequences. Both these sets comprised randomly extracted sequences equal in number to the observed deletions (492) and mimicked the observed size distribution of ARMD events. The first set was extracted from the regions immediately adjacent to randomly picked Alu elements annotated in the reference human genome sequence (called “RSNA” hereafter). The second set comprised sequences randomly extracted from the entire genome sequence, with no additional parameters incorporated (called “RSG” hereafter). We used 5,000 randomized replicates of both sets. For both observed and simulated sets of sequences, we calculated GC content using in-house Perl scripts, whereas the Alu content was analyzed using a locally installed copy of the RepeatMasker Web server. Additionally, to make our estimate of observed percentage Alu content conservative, we trimmed the deleted sequence at each locus to remove remaining fragments of the two Alu elements that caused the ARMD event. Statistical significances of the differences in GC and Alu content were based on Z scores obtained by comparing observed values (from the actual set of deleted sequences) with the mean value obtained from the 5,000 randomly extracted sequence sets.17Hamaker HC Approximating the cumulative normal distribution and its inverse.Appl Stat. 1978; 27: 76-77Crossref Google Scholar All computer programs used are available from the authors on request. To identify putative ARMD loci, we first computationally compared the human and chimpanzee genomes. Subsequently, we manually inspected and, if needed, experimentally verified individual loci. Of the 1,332 computationally predicted deletions that we initially recovered, 461 were discarded after manual inspection (table 1). The causes for rejection of computationally predicted ARMD loci were: (a) insertion of an Alu or other retroelement at the orthologous chimpanzee locus, which leads to the presence of sequence that the computer erroneously assumed to be deleted in the human genome (38 cases), (b) authentic deletion products in the human genome that were not products of Alu-Alu recombination (211 cases), and (c) computational errors in alignment of the human and chimpanzee genomes (212 cases). On the basis of sequence architecture, the remaining 871 loci represented putative ARMD events in the human lineage. All of these loci were further manually inspected and were analyzed, for comparison of the ancestral predeletion and human postdeletion states, by use of a TSD-based strategy as described above (see the “Material and Methods” section). In addition, we experimentally verified the authenticity of 352 candidate ARMD loci by PCR (table 1 and fig. 2). To be conservative, we discarded all loci in which an alternative mechanism (e.g., random genomic deletion), distinct from ARMD, could have produced the deletion. Specifically, ARMD events can be distinguished from random genomic deletions occurring at Alu insertion sites because an ARMD event reconstitutes an uninterrupted chimeric Alu element (i.e., with no internal deletion), whereas the probability of this happening through chance alone (as would be the case with a random deletion) is remote. Indeed, the probability of two ∼280-bp Alu elements breaking by chance at a homologous site is only 1 in ∼80,000 (1 in 280×1 in 280). Hence, although we cannot formally exclude the possibility that a few random deletions may precisely mimic the ARMD process, we believe the overall impact of these nonauthentic events on our estimates would be minimal.Table 1Summary of Human-Specific ARMD EventsClassificationNo. of LociComputationally predicted deletion loci1,332Discarded after manual inspection461Candidate ARMD events:871 False-positive events (Alu insertion in chimpanzee):379 Confirmed by PCR analysis189 Analysis based on TSD structure190 ARMDs:492 Confirmed by PCR analysis163 Analysis based on TSD structure329 Open table in a new tab The manual verification of the 871 loci resulted in a final data set of 492 ARMD events spanning the entire human genome (table 1). Nine ARMD loci on the Y chromosome were all located in the pseudoautosomal part of this chromosome and hence were identical copies of deletion loci on the X chromosome. As a result, each event was counted only once during the analysis. In general, the loci analyzed in this study suggest that the combination of computational data mining and experimental validation is the “gold standard” when conducting comparative genomic searches for lineage-specific deletions. As we observed during the course of this study, lineage-specific insertions in one genome stand a risk of being characterized as deletions in the other when only two genomes are compared in a computational analysis. In our analysis, we minimized the chances of including such events by using three other hominoid genomes as controls during experimental verification of the events. The number of ARMD events is positively correlated with the number of Alu elements present on each chromosome (r=0.69; P<.0005). This is expected, since physical proximity between repetitive elements strongly predisposes them to recombination.18Inoue K Lupski JR Molecular mechanisms for genomic disorders.Annu Rev Genomics Hum Genet. 2002; 3: 199-242Crossref PubMed Scopus (251) Google Scholar Simultaneous mapping of ARMD loci and all Alu insertions on each chromosome highlights the tendency for deletions to cluster with regions of high local Alu density (fig. 3). Additionally, sequence analysis of the Alu elements involved in ARMD events indicates that the number of elements from each Alu subfamily (fig. 4) is proportional to their genomewide copy number,6Batzer MA Deininger PL Alu repeats and human genomic diversity.Nat Rev Genet. 2002; 3: 370-379Crossref PubMed Scopus (1048) Google Scholar with no bias observed for elements from older subfamilies (such as AluJ) that would have had more time for recombination because of their age. This implies that Alu elements throughout the genome have similar chances of recombining with each other, as opposed to a mechanism of preferential recombination between members of an individual subfamily, and that proximity between the elements is the major factor involved in the process. Additional evidence supporting this position comes from the fact that ∼40% (197 of 492) of ARMD events result from inter–Alu subfamily recombinations. However, within this context, the amount of sequence identity between the two elements at a locus also appears to be proportional to their chances of successful recombination, since young AluY elements are overrepresented at ARMD loci compared with their total number in the genome, whereas the opposite is true for older, highly diverged AluJ elements.Figure 4Alu subfamily composition in ARMD events. A, Proportion of Alu elements involved in ARMD events (unblackened bars) versus total number of Alu elements (blackened bars) for each subfamily. B, Subfamily ratios of upstream and downstream Alu elements involved in ARMD events (unblackened and blackened bars, respectively).View Large Image Figure ViewerDownload Hi-res image Download (PPT) The total amount of genomic sequence deleted by this process in the human lineage (i.e., after the human-chimpanzee divergence ∼6 million years ago) is estimated to be 396,420 bp. This is probably a conservative estimate, since our comparative analysis of the human and chimpanzee genomes detects ARMD events only between Alu elements that were inserted before the human-chimpanzee divergence. Therefore, it would miss ARMD loci involving newly inserted human-specific Alu elements.19Carter AB Salem AH Hedges DJ Keegan CN Kimball B Walker JA Watkins WS Jorde LB Batzer MA Genome-wide analysis of the human Alu Yb-lineage.Hum Genomics. 2004; 1: 167-178PubMed Google Scholar, 20Otieno AC Carter AB Hedges DJ Walker JA Ray DA Garber RK Anders BA Stoilova N Laborde ME Fowlkes JD Huang CH Perodeau B Batzer MA Analysis of the human Alu Ya-lineage.J Mol Biol. 2004; 342: 109-118Crossref PubMed Scopus (43) Google Scholar However, the contribution of human-specific Alu elements to ARMD is probably relatively limited, given that there are only ∼7,000 such insertions,13Chimpanzee Sequencing and Analysis ConsortiumInitial sequence of the chimpanzee genome and comparison with the human genome.Nature. 2005; 437: 69-87Crossref PubMed Scopus (1743) Google Scholar as compared with >1 million Alu elements shared between the human and chimpanzee genomes. The ARMDs range in size between 101 bp and 7,255 bp, with an average size of ∼806 bp. A histogram of the size frequency distribution of ARMDs reveals a skew toward shorter ARMD sizes, with ∼75% (368 of 492) of the deletions shorter than 1 kb (fig. 5). Thus, the median ARMD length of 468 bp better represents the most common size category. However, in terms of total genomic sequence deleted, the ∼25% ARMD events >1 kb were responsible for ∼62% (245,263 of 396,420 bp) of the total sequence deleted. Our computational analyses did not return any ARMD loci with deletions <100 bp. Strictly speaking, Alu-Alu recombination elements should not cause deletions of 60% GC (as compared with the ∼62.7% average GC content for the 10 Alu consensus sequences) and complete conservation across all subfamilies. Although these factors may be responsible for higher recombination frequencies in this region, other reasons are also plausible, such as the location of this stretch near the L1 endonuclease cleavage site at the 5′ end of the Alu element, which makes it closer to putative breakage sites during the recombination process. Alu elements in the human genome show a preference for high–GC content areas, except for the most recently integrated subfamilies.1Lander ES Linton LM Birren B Nusbaum C Zody MC Baldwin J Devon K et al.Initial sequencing and analysis of the human genome.Nature. 2001; 409: 860-921Crossref PubMed Scopus (17710) Google Scholar, 22Cordaux R, Lee J, Dinoso L, Batzer MA. Recently integrated Alu retrotransposons are essentially neutral residents of the human genome. Gene (http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6T39-4JF8HCB-3&_coverDate=03%2F09%2F2006&_alid=397417522&_rdoc=1&_fmt=&_orig=search&_qd=1&_cdi=4941&_sort=d&view=c&_acct=C000001358&_version=1&_urlVersion=0&_userid=5745&md5=1acad274031312a563629c5232294196) (electronically published March 6, 2006; accessed May 2, 2006).Google Scholar However, since only a fraction (984 of ∼1.2 million) of the total number of Alu insertions is associated with the ARMD process, it may well be that, in this respect, the deletions themselves behave differently from the Alu family as a whole. To characterize the sequence context in which ARMD events occur, we calculated the percentage of GC content in 20-kb windows of flanking sequence centered on the ARMD loci. Compared with previous analyses of Alu and L1 insertion-mediated (as opposed to postinsertional recombination–mediated) genomic deletions,2Callinan PA Wang J Herke SW Garber RK Liang P Batzer MA Alu retrotransposition-mediated deletion.J Mol Biol. 2005; 348: 791-800Crossref PubMed Scopus (89) Google Scholar, 23Han K Sen SK Wang J Callinan PA Lee J Cordaux R Liang P Batzer MA Genomic rearrangements by LINE-1 insertion-mediated deletion in the human and chimpanzee lineages.Nucleic Acids Res. 2005; 33: 4040-4052Crossref PubM

Referência(s)