A Chromosomal Rearrangement Hotspot Can Be Identified from Population Genetic Variation and Is Coincident with a Hotspot for Allelic Recombination
2006; Elsevier BV; Volume: 79; Issue: 5 Linguagem: Inglês
10.1086/508709
ISSN1537-6605
AutoresSarah Lindsay, Mehrdad Khajavi, James R. Lupski, Matthew E. Hurles,
Tópico(s)Genomics and Phylogenetic Studies
ResumoInsights into the origins of structural variation and the mutational mechanisms underlying genomic disorders would be greatly improved by a genomewide map of hotspots of nonallelic homologous recombination (NAHR). Moreover, our understanding of sequence variation within the duplicated sequences that are substrates for NAHR lags far behind that of sequence variation within the single-copy portion of the genome. Perhaps the best-characterized NAHR hotspot lies within the 24-kb-long Charcot-Marie-Tooth disease type 1A (CMT1A)–repeats (REPs) that sponsor deletions and duplications that cause peripheral neuropathies. We investigated structural and sequence diversity within the CMT1A-REPs, both within and between species. We discovered a high frequency of retroelement insertions, accelerated sequence evolution after duplication, extensive paralogous gene conversion, and a greater than twofold enrichment of SNPs in humans relative to the genome average. We identified an allelic recombination hotspot underlying the known NAHR hotspot, which suggests that the two processes are intimately related. Finally, we used our data to develop a novel method for inferring the location of an NAHR hotspot from sequence variation within segmental duplications and applied it to identify a putative NAHR hotspot within the LCR22 repeats that sponsor velocardiofacial syndrome deletions. We propose that a large-scale project to map sequence variation within segmental duplications would reveal a wealth of novel chromosomal-rearrangement hotspots. Insights into the origins of structural variation and the mutational mechanisms underlying genomic disorders would be greatly improved by a genomewide map of hotspots of nonallelic homologous recombination (NAHR). Moreover, our understanding of sequence variation within the duplicated sequences that are substrates for NAHR lags far behind that of sequence variation within the single-copy portion of the genome. Perhaps the best-characterized NAHR hotspot lies within the 24-kb-long Charcot-Marie-Tooth disease type 1A (CMT1A)–repeats (REPs) that sponsor deletions and duplications that cause peripheral neuropathies. We investigated structural and sequence diversity within the CMT1A-REPs, both within and between species. We discovered a high frequency of retroelement insertions, accelerated sequence evolution after duplication, extensive paralogous gene conversion, and a greater than twofold enrichment of SNPs in humans relative to the genome average. We identified an allelic recombination hotspot underlying the known NAHR hotspot, which suggests that the two processes are intimately related. Finally, we used our data to develop a novel method for inferring the location of an NAHR hotspot from sequence variation within segmental duplications and applied it to identify a putative NAHR hotspot within the LCR22 repeats that sponsor velocardiofacial syndrome deletions. We propose that a large-scale project to map sequence variation within segmental duplications would reveal a wealth of novel chromosomal-rearrangement hotspots. The sequencing of the human genome revealed that at least 5% of the genome consists of long, highly similar duplicated sequences known as "low-copy repeats" (LCRs), or segmental duplications.1Bailey JA Yavor AM Massa HF Trask BJ Eichler EE Segmental duplications: organization and impact within the current human genome project assembly.Genome Res. 2001; 11: 1005-1017Crossref PubMed Scopus (492) Google Scholar, 2Bailey JA Gu Z Clark RA Reinert K Samonte RV Schwartz S Adams MD Myers EW Li PW Eichler EE Recent segmental duplications in the human genome.Science. 2002; 297: 1003-1007Crossref PubMed Scopus (975) Google Scholar These segmental duplications can have high sequence similarity (>90%), can be several hundreds of kilobases in length, and are enriched in ape genomes relative to genomes of other species.2Bailey JA Gu Z Clark RA Reinert K Samonte RV Schwartz S Adams MD Myers EW Li PW Eichler EE Recent segmental duplications in the human genome.Science. 2002; 297: 1003-1007Crossref PubMed Scopus (975) Google Scholar, 3Samonte RV Eichler EE Segmental duplications and the evolution of the primate genome.Nat Rev Genet. 2002; 3: 65-72Crossref PubMed Scopus (308) Google Scholar, 4Stankiewicz P Lupski JR Molecular-evolutionary mechanisms for genomic disorders.Curr Opin Genet Dev. 2002; 12: 312-319Crossref PubMed Scopus (119) Google Scholar Segmental duplications have been shown to have unusual patterns of sequence evolution relative to single-copy sequences, both in terms of orthologous sequence divergence and of reticulate evolution processes between duplications within the same genome,5Hurles ME Willey D Matthews L Hussain SS Origins of chromosomal rearrangement hotspots in the human genome: evidence from the AZFa deletion hotspots.Genome Biol. 2004; 5: R55Crossref PubMed Google Scholar, 6Jackson MS Oliver K Loveland J Humphray S Dunham I Rocchi M Viggiano L Park JP Hurles ME Santibanez-Koref M Evidence for widespread reticulate evolution within human duplicons.Am J Hum Genet. 2005; 77: 824-840Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar and they may well play a central role in the evolution of novel gene function after gene duplication. Moreover, duplicated sequences appear to harbor unusual patterns of sequence variation within humans7Hallast P Nagirnaja L Margus T Laan M Segmental duplications and gene conversion: human luteinizing hormone/chorionic gonadotropin beta gene cluster.Genome Res. 2005; 15: 1535-1546Crossref PubMed Scopus (58) Google Scholar, 8Bosch E Hurles ME Navarro A Jobling MA Dynamics of a human inter-paralog gene conversion hotspot.Genome Res. 2004; 14: 835-844Crossref PubMed Scopus (60) Google Scholar, 9Pavlicek A House R Gentles AJ Jurka J Morrow BE Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome.Genome Res. 2005; 15: 1487-1495Crossref PubMed Scopus (27) Google Scholar, 10Fredman D White SJ Potter S Eichler EE Den Dunnen JT Brookes AJ Complex SNP-related sequence variation in segmental genome duplications.Nat Genet. 2004; 36: 861-866Crossref PubMed Scopus (187) Google Scholar that may result from gene conversion (the nonreciprocal transfer of sequence information between two homologous stretches of DNA) occurring between the duplicated copies. Recent studies have revealed extensive structural variation within the human genome, with a marked enrichment of deletions, duplications, and inversions in and around segmental duplications.11Conrad DF Andrews TD Carter NP Hurles ME Pritchard JK A high-resolution survey of deletion polymorphism in the human genome.Nat Genet. 2006; 38: 75-81Crossref PubMed Scopus (519) Google Scholar, 12Iafrate AJ Feuk L Rivera MN Listewnik ML Donahoe PK Qi Y Scherer SW Lee C Detection of large-scale variation in the human genome.Nat Genet. 2004; 36: 949-951Crossref PubMed Scopus (2231) Google Scholar, 13Sebat J Lakshmi B Troge J Alexander J Young J Lundin P Maner S Massa H Walker M Chi M Navin N Lucito R Healy J Hicks J Ye K Reiner A Gilliam TC Trask B Patterson N Zetterberg A Wigler M Large-scale copy number polymorphism in the human genome.Science. 2004; 305: 525-528Crossref PubMed Scopus (1898) Google Scholar, 14Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (690) Google Scholar, 15McCarroll SA Hadnott TN Perry GH Sabeti PC Zody MC Barrett JC Dallaire S Gabriel SB Lee C Daly MJ Altshuler DM Common deletion polymorphisms in the human genome.Nat Genet. 2006; 38: 86-92Crossref PubMed Scopus (558) Google Scholar The study of the functional importance of these structural variants is in its infancy, but the number of genetic diseases in which the structural dynamism conferred by segmental duplications plays a major role has been growing rapidly. The dominant role that genomic architecture plays in diseases—such as Charcot-Marie-Tooth disease type 1A (CMT1A [MIM 118220]) and hereditary neuropathy with pressure palsies (HNPP [MIM 162500]), due to reciprocal duplication and deletion on chromosome 17p12,16Pentao L Wise CA Chinault AC Patel PI Lupski JR Charcot-Marie-Tooth type-1a duplication appears to arise from recombination at repeat sequences flanking the 1.5 Mb monomer unit.Nat Genet. 1992; 2: 292-300Crossref PubMed Scopus (334) Google Scholar Smith-Magenis syndrome (SMS [MIM 182290]) and dup(17)(p11.2p11.2) due to reciprocal deletions and duplications on chromosome 17p11.2,17Chen KS Manian P Koeuth T Potocki L Zhao Q Chinault AC Lee CC Lupski JR Homologous recombination of a flanking repeat gene cluster is a mechanism for a common contiguous gene deletion syndrome.Nat Genet. 1997; 17: 154-163Crossref PubMed Scopus (323) Google Scholar and deletions causing neurofibromatosis type I (NF1 [MIM 162200]) on chromosome 17q11.218Lopez Correa C Brems H Lazaro C Marynen P Legius E Unequal meiotic crossover: a frequent cause of NF1 microdeletions.Am J Hum Genet. 2000; 66: 1969-1974Abstract Full Text Full Text PDF PubMed Scopus (78) Google Scholar and Sotos syndrome (MIM 117550) on 5q3519Kurotaki N Stankiewicz P Wakui K Niikawa N Lupski JR Sotos syndrome common deletion is mediated by directly oriented subunits within inverted Sos-REP low-copy repeats.Hum Mol Genet. 2005; 14: 535-542Crossref PubMed Scopus (60) Google Scholar, 20Visser R Shimokawa O Harada N Kinoshita A Ohta T Niikawa N Matsumoto N Identification of a 3.0-kb major recombination hotspot in patients with Sotos syndrome who carry a common 1.9-Mb microdeletion.Am J Hum Genet. 2005; 76: 52-67Abstract Full Text Full Text PDF PubMed Scopus (108) Google Scholar—has led to them being classified as "genomic disorders."21Lupski JR Stankiewicz P Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes.PLoS Genet. 2005; 1: e49Crossref PubMed Scopus (416) Google Scholar, 22Lupski JR Stankiewicz P Genomic disorders: the genomic basis of disease. Humana Press, Totawa, New Jersey2006Crossref Scopus (30) Google Scholar Despite this work, there remains a substantial proportion of segmental duplications whose propensity to undergo nonallelic homologous recombination (NAHR) and to generate potentially disease-causing rearrangements is unknown. Although the association between structural variation and segmental duplications has been observed, the experimental demonstration of NAHR as a mechanism for such changes remains to be fully documented. The germline mutational process underlying these observations of structural dynamism at segmental duplications is known as "NAHR" and can result in duplications, deletions, and inversions of genomic segments between copies of a duplicated sequence.23Stankiewicz P Lupski JR Genome architecture, rearrangements and genomic disorders.Trends Genet. 2002; 18: 74-82Abstract Full Text Full Text PDF PubMed Scopus (666) Google Scholar NAHR shares a number of important features with allelic meiotic recombination, which has led to suggestions that the two processes operate by similar mechanisms.24Lupski JR Hotspots of homologous recombination in the human genome: not all homologous sequences are equal.Genome Biol. 2004; 5: 242Crossref PubMed Scopus (54) Google Scholar One striking similarity between allelic homologous recombination (AHR) and NAHR is the existence of hotspots of recombinatorial activity in which both crossovers and gene-conversion events cluster. In all genomic disorders in which the precise breakpoints of numerous independent rearrangements have been mapped, it has been found, by DNA sequence analysis of the products of recombination, that the breakpoints cluster within small intervals of greatly enhanced recombinatorial activity.25Hurles ME Lupski JR Recombination hotspots in nonallelic homologous recombination.in: Lupski JR Stankiewicz P Genomic disorders: the genomic basis of disease. Humana Press, Totawa, New Jersey2006Google Scholar The likelihood of a breakpoint falling within one of these NAHR hotspots can be >2 orders of magnitude greater than in the surrounding sequence. These NAHR hotspots have size and morphology similar to experimentally determined AHR hotspots.25Hurles ME Lupski JR Recombination hotspots in nonallelic homologous recombination.in: Lupski JR Stankiewicz P Genomic disorders: the genomic basis of disease. Humana Press, Totawa, New Jersey2006Google Scholar The study of AHR hotspots has been revolutionized by the genomewide inference of local recombination rates from patterns of sequence variation within populations. Whereas there are only ∼10–20 experimentally determined AHR hotspots, the locations of ∼50,000 AHR hotspots have been inferred throughout the genome from population genetic data.26Myers S Bottolo L Freeman C McVean G Donnelly P A fine-scale map of recombination rates and hotspots across the human genome.Science. 2005; 310: 321-324Crossref PubMed Scopus (796) Google Scholar No such revolution has yet accelerated the discovery of NAHR hotspots. A genomewide map of NAHR hotspots would facilitate the identification of loci at which rearrangements result in embryonic lethality, would catalyze the discovery of other genomic disorders, and would inform our understanding of the origins of structural variation. Patterns of sequence variation and linkage disequilibrium (LD) within segmental duplications remain largely uncharacterized; segmental duplications are deemed outside the portion of the genome amenable to genomewide haplotype mapping.27IHMC A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4520) Google Scholar Three types of variant sites are apparent within sequence alignments of duplicated sequences: sites that differ between allelic copies (i.e., SNPs), fixed sites that differ between paralogous copies (i.e., paralogous sequence variants [PSVs]), and a special class of SNPs that are polymorphic across paralogous sequences (i.e., multisite variants [MSVs]). Given the role that NAHR hotspots potentially play in disease-causing rearrangements, it is of great interest to be able to characterize sequence variation within segmental duplications and to identify signatures of NAHR hotspot activity from these data. To develop a method that will enable identification of hotspots for NAHR solely from sequence variation, it is necessary to improve our understanding of the evolutionary processes occurring within segmental duplications. Elsewhere, we have demonstrated that, at two known Y-chromosomal NAHR hotspots, the presence of an NAHR hotspot could be inferred from comparisons between human and great ape sequences of the duplicated sequence containing the hotspot.5Hurles ME Willey D Matthews L Hussain SS Origins of chromosomal rearrangement hotspots in the human genome: evidence from the AZFa deletion hotspots.Genome Biol. 2004; 5: R55Crossref PubMed Google Scholar The extension of this method to autosomal NAHR hotspots has been thrown into doubt by the demonstration of the short-lived evolutionary nature of AHR hotspots.28Winckler W Myers SR Richter DJ Onofrio RC McDonald GJ Bontrop RE McVean GA Gabriel SB Reich D Donnelly P Altshuler D Comparison of fine-scale recombination rates in humans and chimpanzees.Science. 2005; 308: 107-111Crossref PubMed Scopus (289) Google Scholar, 29Ptak SE Hinds DA Koehler K Nickel B Patil N Ballinger DG Przeworski M Frazer KA Paabo S Fine-scale recombination patterns differ between chimpanzees and humans.Nat Genet. 2005; 37: 429-434Crossref PubMed Scopus (227) Google Scholar The 24-kb-long CMT1A-repeat (REP) segmental duplications30Reiter LT Murakami T Koeuth T Gibbs RA Lupski JR The human COX10 gene is disrupted during homologous recombination between the 24 kb proximal and distal CMT1A-REPs.Hum Mol Genet. 1997; 6: 1595-1603Crossref PubMed Scopus (76) Google Scholar that sponsor pathogenic HNPP deletions and reciprocal CMT1A duplications are ideal loci for exploring the consequences of duplication on sequence evolution and for developing methods to identify NAHR hotspots. The CMT1A-REPs were duplicated recently on 17p11.2-12 in the common ancestor of humans and chimpanzees, with the distal copy ancestral, and the human copies share 98.7% sequence similarity.30Reiter LT Murakami T Koeuth T Gibbs RA Lupski JR The human COX10 gene is disrupted during homologous recombination between the 24 kb proximal and distal CMT1A-REPs.Hum Mol Genet. 1997; 6: 1595-1603Crossref PubMed Scopus (76) Google Scholar, 31Kiyosawa H Chance PF Primate origin of the CMT1A-REP repeat and analysis of a putative transposon-associated recombinational hotspot.Hum Mol Genet. 1996; 5: 745-753Crossref PubMed Scopus (83) Google Scholar These repeats contain a well-characterized ∼600-bp-long NAHR hotspot that has an ∼50-fold elevated rate of crossover compared with the surrounding sequence and is shared among populations.32Reiter LT Murakami T Koeuth T Pentao L Muzny DM Gibbs RA Lupski JR A recombination hotspot responsible for two inherited peripheral neuropathies is located near a mariner transposon-like element.Nat Genet. 1996; 12: 288-297Crossref PubMed Scopus (272) Google Scholar, 33Lopes J Ravise N Vandenberghe A Palau F Ionasescu V Mayer M Levy N Wood N Tachi N Bouche P Latour P Ruberg M Brice A LeGuern E Fine mapping of de novo CMT1A and HNPP rearrangements within CMT1A-REPs evidences two distinct sex-dependent mechanisms and candidate sequences involved in recombination.Hum Mol Genet. 1998; 7: 141-148Crossref PubMed Scopus (74) Google Scholar In this study, we characterized structural and sequence variation at the CMT1A-REPs in humans and hominoid species, by using a combination of Southern hybridization and resequencing by shotgun haplotyping.34Lindsay SJ Bonfield JK Hurles ME Shotgun haplotyping: a novel method for surveying allelic sequence variation.Nucleic Acids Res. 2005; 33: e152Crossref PubMed Scopus (8) Google Scholar We demonstrate that post-duplication gene conversion has altered the pattern and rate of sequence evolution in the CMT1A-REPs, and we develop a robust, novel method for identifying NAHR hotspots from patterns of sequence diversity within humans. Complete CMT1A-REP sequences were generated from genomic DNA from cell lines of (i) 10 unrelated males from the European Collection of Cell Cultures (ECACC) ethnic diversity panel (2 Australian Aborigine, 2 from the United Kingdom, 1 Italian, 1 Japanese, 2 Zulu, and 2 Native American) and (ii) chimpanzee, gorilla, orangutan, and gibbon, from the ECACC primate panel. Southern hybridization and limited resequencing was performed on 72 samples from the CEPH Human Genome Diversity Panel and on samples from the Baylor College of Medicine control panel (93 African American, 98 Hispanic, 95 European American, and 72 Asian American). Restriction-enzyme digests were performed according to the manufacturer's instructions. We used a dosage-analysis approach, using a CMT1A-REP probe derived from a purified restriction fragment from a cosmid described elsewhere.16Pentao L Wise CA Chinault AC Patel PI Lupski JR Charcot-Marie-Tooth type-1a duplication appears to arise from recombination at repeat sequences flanking the 1.5 Mb monomer unit.Nat Genet. 1992; 2: 292-300Crossref PubMed Scopus (334) Google Scholar Probes labeled with 32P-α-deoxycytidine triphosphate with the Rediprime II labeling kit (Amersham Pharmacia Biotech) identified two EcoRI restriction fragments on chromosome 17p11.2-12 (a 7.9-kb EcoRI fragment localized to the proximal CMT1A duplication monomer region and a 6.1-kb EcoRI fragment mapping to the distal CMT1A region). The 7.9-kb proximal and 6.1-kb distal EcoRI fragments are contained entirely within the CMT1A-REP sequence. Each CMT1A-REP was amplified in two portions, with the use of a non–repeat-specific internal primer and an external primer located in flanking single-copy sequence. The distal repeat was amplified in two portions with use of the oligo pairs (1) CMT1AD2 CCACATTACTGCTTCCTCATGTGT and CMT1AINT5 GTTCATGGTTCATGCTGAGGGTTG and (2) CMT1AD1 GGGGGTAGAAAAGGGGTCTCATTTTCC and CMT1AINT3 ATTACAGCTACTGTTGCAGCAGTG, which amplified products of 12,777 and 11,327 bp, respectively. The proximal repeat was amplified in two portions, with use of the oligo pairs (3) CMT1AP2 CTTAGCCATTGCCCATTGATGGAC and CMT1AINT5 GTTCATGGTTCATGCTGAGGGTTG and (4) CMT1AP1 CCATTAGAGAGCTTTCCTCATTGC and CMT1AINT3 ATTACAGCTACTGTTGCAGCAGTG, which amplified products of 12,600 and 11,344 bp, respectively. These PCR fragments do not overlap in the center of the repeat, so additional primers were designed to obtain the genotypic sequence for the middle portion of the repeats. The gap between PCR products runs from 10,230 to 11,304 bp in our alignment, so the SNP information in this region between haplotypic sequences comes from genotypic sequences unless specified. To obtain gap sequence, long PCR was performed as described above, but with the use of primers CMT1AP1 CCATTAGAGAGCTTTCCTCATTGC and CMT1A_Join1 GCAGTGATGCTCAGTAGAAAG, at an annealing temperature of 60°C and an extension time of 13 min, and with CMT1AD1 GGGGGTAGAAAAGGGGTCTCATTTTCC and CMT1A_Join3 GGGCTGATGTTTAGTAAACAA, at an annealing temperature of 57°C and an extension time of 13 min. All PCR reactions were performed in a 50−μl volume with the use of the Expand 20Kb Plus PCR kit (Roche Applied Science) and 200 ng of genomic DNA as template. Unless otherwise stated, the reactions were performed following the manufacturer's protocol, with an extension time of 11 min and an annealing temperature of 57°C. All oligos were synthesized by Sigma Genosys. The long-PCR products were fragmented, cloned, and shotgun sequenced to a high depth (>20× coverage) with the use of the shotgun-haplotyping method,34Lindsay SJ Bonfield JK Hurles ME Shotgun haplotyping: a novel method for surveying allelic sequence variation.Nucleic Acids Res. 2005; 33: e152Crossref PubMed Scopus (8) Google Scholar which recovers haplotypic sequence across the length of the PCR product by assembling read pairs from the two alleles into separate assemblies. To obtain sequence data from the middle of the repeat, we direct sequenced additional PCR products, using the PCR primer CMT1A_Join3 and an additional internal sequencing primer, CMT1A_Join2 CATAGAAATGTGTGGACCAAT. The sequence data were then assembled using the Gap4 assembly software; SNPs were automatically called, and then the haplotypic sequences were exported. In two individuals, one Native American male (AMA) and one U.K. male (C07220), the alleles from one of the long PCRs (3 and 4, respectively) were monomorphic. Targeted resequencing of individuals with unusual Southern banding patterns was performed using shotgun haplotyping or with staged primers (see table 1) after amplification with PCR primers listed above. The GenBank accession numbers for the sequences generated in this study are DQ480370–DQ480419.Table 1Primers Used in Targeted ResequencingPrimerSequenceForward: CMT1ASEQF_1AGAATCGCTTGAACCCAG CMT1ASEQF_2CTTGGTGCCAGGTTTGAG CMT1ASEQF_3CTCATCCACTGCAAACCTC CMT1ASEQF_4CTCTGGTTTTAGGTTTATCAC CMT1ASEQF_5GACAGTATGAACGATTTAAGC CMT1ASEQF_6AGTGCTACAGCTCAGGGAG CMT1ASEQF_7TATGTTTGTGGGAGCTCTG CMT1ASEQF_8GTAATGAATTACAGGCTCAGC CMT1ASEQF_9CAGCCAACTCCCTAGAGAG CMT1ASEQF_10CTGGATCTGCTCAATAACCTA CMT1ASEQF_11TTGCTTCAACACTCTTCAAAG CMT1ASEQF_12TAAGAACAGATGATGTTGAAG CMT1ASEQF_13GGAGTGGTCCAGTCAAAAG CMT1ASEQF_14TGGCTGGTATCACTGCTTAC CMT1ASEQF_15GTTTACTCCTTCTTCAAGTTC CMT1ASEQF_16CTGTGTGAAATTTATTTTCCTG CMT1ASEQF_17CTAGATGTGAAACTGCTAAGTC CMT1ASEQF_18GAGTCTTGAGTTGGGATG CMT1ASEQF_19GAATGGCAAGTTTATGTTCC CMT1ASEQF_20GATCATGGTTTGCACTTTAAG CMT1ASEQF_21CATCTCAGTTAAACACAACTATG CMT1ASEQF_22GTCTCTGGGCTCAGTTACC CMT1ASEQF_23CATTAAAAGTACTTAAAGGGC CMT1ASEQF_24GAAGCCTATGAAATGTTAAGTC CMT1ASEQF_25GGCATAATTACATTCAGAACG CMT1ASEQF_26TTCAGCCTTCTTATTCAGAGG CMT1ASEQF_27CTATCCATCTCTGCTGCTT CMT1ASEQF_28GCTCTCACAGCACTTCTTAG CMT1ASEQF_29GAATACTGTCACAGCTCACAAC CMT1ASEQF_30CTGCTAGACTAGTGTGCCTG CMT1ASEQF_31CATAATCAAAATACAGATGTCTC CMT1ASEQF_32GTGTCTACATCTCTCTGTCTTC CMT1ASEQF_33GAACTCAGCTGGAGGACCTA CMT1ASEQF_34GTGTTAGACTCTGGCAGCTA CMT1ASEQF_35TGACACAGCCTAAGGAGAAC CMT1ASEQF_36TGGATGGTCAAAATAGCTAAT CMT1ASEQF_37GACAATGTCATTGCATTGATT CMT1ASEQF_38GTGTTCACTAGAGTTTGAGAATC CMT1ASEQF_39CCTGGTCCAAGCTATCATC CMT1ASEQF_40CTGTTCACTGGCTAAAGTTC CMT1ASEQF_41CAATACACAAATTTACCCATTG CMT1ASEQF_42GACATGTGTCCATACATACA CMT1ASEQF_43TCAGCCTCCCAAGTAGTTG CMT1ASEQF_44CATGGTTTGTGGCTTTTTG CMT1ASEQF_45CATCCTTCCCTCTACAGCTG CMT1ASEQF_46CCCAGAATGTACAAGGAATTC CMT1ASEQF_47CAGCCTAAGTAACAGAGCAAG CMT1ASEQF_48TTTCATCACAGCAAAGATATG CMT1ASEQF_49GTACAGACAGTGTGGAGTGTAG CMT1ASEQF_50GACGTGATTCCAAGAGAATG CMT1ASEQF_51TGTTATGAGATGAACAGGTTAGReverse: CMT1ASEQR_1GTGGGTTCAAGCGATTCTC CMT1ASEQR_2AACTTCCTCAAACCTGGC CMT1ASEQR_3GTGAGGTTTGCAGTGGATG CMT1ASEQR_4CTTGTGTGTTCTGTGATAAAC CMT1ASEQR_5GCTTAAATCGTTCATACTGTC CMT1ASEQR_6GTAGCACTTGGTTCGATTC CMT1ASEQR_7TCAAGAAGAGCAGCCAGAG CMT1ASEQR_8GTGTCCATAAGTCACGTTAC CMT1ASEQR_9CATAGGGTAAAGTCTGCCTCTC CMT1ASEQR_10GACAGCCTCTTCTGAAATG CMT1ASEQR_11CTGTTTTAGCAGAATTCATG CMT1ASEQR_12CAGCCAATCTGAGTACAATG CMT1ASEQR_13CAGGACTGTGAAGTGGATG CMT1ASEQR_14GAAGAACAACATTATAATATGTG CMT1ASEQR_15GATGCTGAGAGTGATCATGG CMT1ASEQR_16CAAAGAATAAGCCACTGTGTG CMT1ASEQR_17TTACTTGCTTTGAGTCTCAG CMT1ASEQR_18TAGGCTATTTGCATAATGGAG CMT1ASEQR_19AAAATCTGGAAGACTTCACTG CMT1ASEQR_20GCCTTATTTCAAACAAGAGTTC CMT1ASEQR_21GAAAGAGTATATAGACTAATTGAC CMT1ASEQR_22GGTAACTGAGCCCAGAGAC CMT1ASEQR_23GATAACAGAGCTAGTTCACAG CMT1ASEQR_24CATTTCATAGGCTTCTCTGAG CMT1ASEQR_25CGTTCTGAATGTAATTATGCC CMT1ASEQR_26CATAATTCAGGGGTCACCAC CMT1ASEQR_27CAAAGCAGCAGAGATGGATAG CMT1ASEQR_28GTCTAAGAAGTGCTGTGAGAG CMT1ASEQR_29GAGCTGTGACAGTATTCCTA CMT1ASEQR_30CATCACCCAGGCACACTAG CMT1ASEQR_31CTTGTTATTTCRGAGACATCTG CMT1ASEQR_32CYATCTYTTATCAAACCTGAG CMT1ASEQR_33GGACTCATATCAAAATGGC CMT1ASEQR_34AACGAAAGCTGCATATCTGC CMT1ASEQR_35CTGCATCTGTTCTCCTTAGG CMT1ASEQR_36GAAATTATTAGCTATTTTGACC CMT1ASEQR_37CAAGGAGAATACAGAGGACAG CMT1ASEQR_38CTCAAACTCTAGTGAACACAGTC CMT1ASEQR_39GTGAAGTTCTKCTGTTACTCAGC CMT1ASEQR_40TTAGCCAGTGAACAGTTGAG CMT1ASEQR_41CAGAACTAACTGTACACTTGC CMT1ASEQR_42CTAAAGACAACCTAAACATCC CMT1ASEQR_43CAGGAAGCAGAYGTTGCAG CMT1ASEQR_44AAACCATGCAAGAAAAGTAG CMT1ASEQR_45CAGCTGTAGAGGGAAGGATG CMT1ASEQR_46GAATTCCTTGTACATTCTGG CMT1ASEQR_47AGTGGCAGGACTATGGCTC CMT1ASEQR_48CTTTGCTGTGATGAAAAGTAC CMT1ASEQR_49CACTCCACACTGTCTGTACC CMT1ASEQR_50CACACGTGCATTAAAAGGAC CMT1ASEQR_51TTCTAACCTGTTCATCTCATAAC CMT1ASEQR_52CCACATTACTGCTTCCTC Open table in a new tab For genotypic analyses of the entire CMT1A-REP, the component haplotypic sequences were arbitrarily spliced together to form 24-kb allelic sequences for each individual, with each half containing true haplotypic sequence. Sequences were aligned with the CMT1A-REP GenBank reference sequences, with the use of BioEdit and Se-Al. All analyses not dependent on having true haplotypes were derived from these alignments of spliced haplotypes. Repetitive elements were detected by RepeatMasker. Jukes-Cantor distances and nucleotide diversity (π) were calculated from full 24-kb primate and human reference sequences for each repeat, with the use of PHYLIP and DNASP,35Rozas J Sanchez-DelBarrio JC Messeguer X Rozas R DnaSP, DNA polymorphism analyses by the coalescent and other methods.Bioinformatics. 2003; 19: 2496-2497Crossref PubMed Scopus (5083) Google Scholar respectively. Phylogenetic networks and trees were constructed, using SplitsTree,36Huson DH SplitsTree: analyzing and visualizing evolutionary data.Bioinformatics. 1998; 14: 68-73Crossref PubMed Scopus (1077) Google Scholar from full 24-kb sequences from the human and primate panels for each repeat. Neighbor-joining trees were constructed using 24-kb sequences from the human and primate panels, with the use of PHYLIP under Felsenstein 84 (F84) and Jukes-Cantor models of evolution with 1,000 bootstrap replicates. TREE-PUZZLE37Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing.Bioinformatics. 2002; 18: 502-504Crossref PubMed Scopus (2115) Google Scholar was used to construct maximum-likelihood trees, to compute branch lengths, and to perform the likelihood ratio to test the molecular clock hypothesis. Single alleles from each CMT1A-REP from all the primate species and from a single human were used and were modeled using F84 distances, with the gibbon sequence specified as an outgroup. The sliding window plots (with window size 700 bp) of the two indices for identifying NAHR hotspots—concerted index and hotspot index—were generated in Excel, from lists of variant site (i.e., SNPs, PSVs, and MSVs) output from alignments, by code written in Interactive Data Language 6.0 (Research Systems). A permutation test written in Interactive Data Language 6.0 was performed to test the significance of the hotspot index; 10,000 replicates were performed, in which the positions of the observed numbers of MSVs, SNPs, and PSVs were randomized along a 24-kb stretch of DNA. The haplotype-reconstruction program PHASE 2.138Crawford DC Bhangale T Li N Hellenthal G Rieder MJ Nickerson DA Stephens M Evidence for substantial fine-scale variation in recombination rates across the human genome.Nat Genet. 2004; 36: 700-706Crossref PubMed Scopus (227) Google Scholar, 39Li N Stephens M Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.Genetics. 2003; 165: 2213-2233PubMed Google Scholar was used to predict AHR hotspots in the proximal and distal CMT1A-REPs. All human CMT1A-REP sequences were entered in genotypic form for the analysis, to avoid any generation of false recombinants by errone
Referência(s)