Artigo Acesso aberto Revisado por pares

Intronic Breakpoint Signatures Enhance Detection and Characterization of Clinically Relevant Germline Structural Variants

2021; Elsevier BV; Volume: 23; Issue: 5 Linguagem: Inglês

10.1016/j.jmoldx.2021.01.015

ISSN

1943-7811

Autores

Jeroen van den Akker, Lawrence Hon, Anjana Ondov, Ziga Mahkovec, Robert O’Connor, Raymond C. Chan, J. M. Lock, Anjali D. Zimmer, Asha Rostamianfar, Jeremy Ginsberg, Annette Leon, Scott Topper,

Tópico(s)

Cancer Genomics and Diagnostics

Resumo

The relevance of large copy number variants (CNVs) to hereditary disorders has been long recognized, and population sequencing efforts have chronicled many common structural variants (SVs). However, limited data are available on the clinical contribution of rare germline SVs. Here, a detailed characterization of SVs identified using targeted next-generation sequencing was performed. Across 50 genes associated with hereditary cancer and cardiovascular disorders, a minimum of 828 unique SVs were reported, including 584 fully characterized SVs. Almost 40% of CNVs were <5 kb, with one in three deletions impacting a single exon. Additionally, 36 mid-range deletions/duplications (50 to 250 bp), 21 mobile element insertions, 6 inversions, and 27 complex rearrangements were detected. This data set was used to model SV detection in a bioinformatics pipeline solely relying on read depth, which revealed that genome sequencing (30×) allows detection of 71%, a 500× panel only targeting coding regions 53%, and exome sequencing (100×) <20% of characterized SVs. SVs accounted for 14.1% of all unique pathogenic variants, supporting the importance of SVs in hereditary disorders. Robust SV detection requires an ensemble of variant-calling algorithms that utilize sequencing of intronic regions. These algorithms should use distinct data features representative of each class of mutational mechanism, including recombination between two sequences sharing high similarity, covariants inserted between CNV breakpoints, and complex rearrangements containing inverted sequences. The relevance of large copy number variants (CNVs) to hereditary disorders has been long recognized, and population sequencing efforts have chronicled many common structural variants (SVs). However, limited data are available on the clinical contribution of rare germline SVs. Here, a detailed characterization of SVs identified using targeted next-generation sequencing was performed. Across 50 genes associated with hereditary cancer and cardiovascular disorders, a minimum of 828 unique SVs were reported, including 584 fully characterized SVs. Almost 40% of CNVs were <5 kb, with one in three deletions impacting a single exon. Additionally, 36 mid-range deletions/duplications (50 to 250 bp), 21 mobile element insertions, 6 inversions, and 27 complex rearrangements were detected. This data set was used to model SV detection in a bioinformatics pipeline solely relying on read depth, which revealed that genome sequencing (30×) allows detection of 71%, a 500× panel only targeting coding regions 53%, and exome sequencing (100×) 0. The relative orientation of both alignments defines the variant type (deletion, tandem duplication, inversion). Similar to LUMPY, high specificity is achieved using filters for the minimum number and fraction of supporting reads per breakpoint, as well as the underlying copy number ratio. This algorithm has detected both deletions and duplications as small as 35 bp, using a minimum of five supporting reads. Lastly, Scalpel software version 0.5.3 22Narzisi G. O’Rawe J.A. Iossifov I. Fang H. Lee Y.-H. Wang Z. Wu Y. Lyon G.J. Wigler M. Schatz M.C. Accurate de novo and transmitted indel detection in exome-capture data using microassembly.Nat Methods. 2014; 11: 1033-1036Crossref PubMed Scopus (131) Google Scholar has been added to the pipeline to supplement the detection of indels >20 bp using a microassembly approach, which allowed detection of indels even beyond a read length of 150 bp. Inversions can be detected both by LUMPY and an in-house developed split-read algorithm. To reduce the false discovery rate, inversions smaller than 1 kb required more stringent read support. Palindromic sequencing artifacts frequently resulted in inversion calls <250 bp, which were therefore completely discarded. Complex rearrangements, which frequently contain both a copy number and an inversion component, were reconstructed manually by inspecting overlapping variant calls. Mobile element insertions were identified by an in-house developed algorithm. Similar to the split-read algorithm, it relies on the identification of a group of reads with soft- and/or hard-clipping at the same start or end position but requiring the absence of supplementary alignments. This generic approach allows the detection of any foreign sequence that deviates substantially from the reference sequence. High specificity is achieved using filters against the number of soft-clipped reads, the fraction of soft-clipped reads, the sequence agreement of the soft-clipped reads, the GC content of the locus, the sequence uniformity of the locus, the number of reads supporting a nearby large indel, and the number of inverted reads. The insertion sequence is reconstructed using a local assembly of the supporting reads. If this sequence renders a strong match against a library of mobile element sequences [Dfam version 3.1 database23Hubley R. Finn R.D. Clements J. Eddy S.R. Jones T.A. Bao W. Smit A.F.A. Wheeler T.J. The Dfam database of repetitive DNA families.Nucleic Acids Res. 2016; 44: D81-D89Crossref PubMed Scopus (218) Google Scholar(Dfam, https://dfam.org, last accessed January 27, 2021)], the call quality score is boosted. Small variants (<50 bp) are identified using GATK software version 3.4 24DePristo M.A. Banks E. Poplin R. Garimella K.V. Maguire J.R. Hartl C. Philippakis A.A. del Angel G. Rivas M.A. Hanna M. McKenna A. Fennell T.J. Kernytsky A.M. Sivachenko A.Y. Cibulskis K. Gabriel S.B. Altshuler D. Daly M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat Genet. 2011; 43: 491-498Crossref PubMed Scopus (6252) Google Scholar and DeepVariant software version 0.6.0.25Poplin R. Chang P.-C. Alexander D. Schwartz S. Colthurst T. Ku A. Newburger D. Dijamco J. Nguyen N. Afshar P.T. Gross S.S. Dorfman L. McLean C.Y. DePristo M.A. A universal SNP and small-indel variant caller using deep neural networks.Nat Biotechnol. 2018; 36: 983-987Crossref PubMed Scopus (192) Google Scholar An internally developed algorithm, which incorporates a local assembly step, is used to merge multiple phased variants into a multiple nucleotide variant, which is effectively no different from a single large indel. SAMtools software version 0.1.1926Li H. Handsaker B. Wysoker A. Fennell T. Ruan J. Homer N. Marth G. Abecasis G. Durbin R. 1000 Genome Project Data Processing SubgroupThe Sequence Alignment/Map format and SAMtools.Bioinformatics. 2009; 25: 2078-2079Crossref PubMed Scopus (25691) Google Scholar was used to build an algorithm for the detection of variants in the vicinity of long homopolymers. Only SVs that were confirmed by an orthogonal technology and classified as pathogenic, likely pathogenic (hereafter collectively labeled as pathogenic) or variant of uncertain significance (VUS) according to American College of Medical Genetics and Genomics/Association for Molecular Pathology guidelines for variant classification of single-gene CNVs27Brandt T. Sack L.M. Arjona D. Tan D. Mei H. Cui H. Gao H. Bean L.J.H. Ankala A. Del Gaudio D. Knight Johnson A. Vincent L.M. Reavey C. Lai A. Richard G. Meck J.M. Adapting ACMG/AMP sequence variant classification guidelines for single-gene copy number variants.Genet Med. 2020; 22: 336-344Crossref PubMed Scopus (25) Google Scholar,28Riggs E.R. Andersen E.F. Cherry A.M. Kantarci S. Kearney H. Patel A. Raca G. Ritter D.I. South S.T. Thorland E.C. Pineda-Alvarez D. Aradhya S. Martin C.L. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen).Genet Med. 2020; 22: 245-257Crossref PubMed Scopus (189) Google Scholar were included in this study. CNVs that were detected based on read depth alone, without identifying the breakpoints, were confirmed by array comparative genomic hybridization or multiplex ligation-dependent probe amplification. If the breakpoints of a CNV were identified, PCR primers were designed specific to the deletion or tandem duplication, and breakpoints were confirmed by Sanger sequencing. For a small number of CNVs, additional analysis was performed for research purposes using GS. All inversions, mobile element insertions (MEIs), and complex rearrangements were confirmed by PCR followed by Sanger sequencing of the relevant genomic locus or loci. One of the goals of this study was to characterize the variation in length among CNVs, which is largely lost when looking only at the coding exons they impact. In order to compose a reliable assessment of the spectrum of structural variants, CNVs for which the breakpoints had not been identified, but which impacted the same exon(s), were considered to be the same variant. Read-depth calls were mapped to variants with characterized breakpoints if their reciprocal overlap was at least 90%. Herein, the smallest difference between breakpoints and call boundaries was prioritized, and this distance could not exceed 500 bp. For the remaining CNV calls, boundaries were reset to the outer boundaries of the encompassed coding exons, padded by 100 bp. For the subset of CNVs that were completely characterized, breakpoints were annotated with data tracks for RepeatMasker and segmental duplications, which were downloaded from the UCSC genome browser.29Kent W.J. Sugnet C.W. Furey T.S. Roskin K.M. Pringle T.H. Zahler A.M. Haussler D. The human genome browser at UCSC.Genome Res. 2002; 12: 996-1006Crossref PubMed Scopus (6053) Google Scholar Elements annotated as FLAM or FRAM (free left/right Alu monomer) were processed as being an Alu subfamily member. Sequence signatures present at CNV breakpoints were analyzed to determine the mutational mechanism that likely drove these rearrangements. CNVs with high sequence similarity between both breakpoints can be attributed to nonallelic homologous recombination (NAHR). This typically occurs between low copy repeats, which share identical sequences well beyond 100 bp.30Hastings P.J. Lupski J.R. Rosenberg S.M. Ira G. Mechanisms of change in gene copy number.Nat Rev Genet. 2009; 10: 551-564Crossref PubMed Scopus (777) Google Scholar, 31Conrad D.F. Bird C. Blackburne B. Lindsay S. Mamanova L. Lee C. Turner D.J. Hurles M.E. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs.Nat Genet. 2010; 42: 385-391Crossref PubMed Scopus (180) Google Scholar, 32Song X. Beck C.R. Du R. Campbell I.M. Coban-Akdemir Z. Gu S. Breman A.M. Stankiewicz P. Ira G. Shaw C.A. Lupski J.R. Predicting human genes susceptible to genomic instability associated with Alu/Alu-mediated rearrangements.Genome Res. 2018; 28: 1228-1242Crossref PubMed Scopus (35) Google Scholar Alternative end-joining includes several pathways relying on different degrees of sequence similarity.33Sallmyr A. Tomkinson A.E. Repair of DNA double-strand breaks by mammalian alternative end-joining pathways.J Biol Chem. 2018; 293: 10536-10546Abstract Full Text Full Text PDF PubMed Scopus (84) Google Scholar Single-strand annealing repairs double-strand breaks by joining two similar repeat sequences up- and downstream of the break.34Chang H.H.Y. Pannunzio N.R. Adachi N. Lieber M.R. Non-homologous DNA end joining and alternative pathways to double-strand break repair.Nat Rev Mol Cell Biol. 2017; 18: 495-506Crossref PubMed Scopus (564) Google Scholar Although this has typically been associated with deletions,30Hastings P.J. Lupski J.R. Rosenberg S.M. Ira G. Mechanisms of change in gene copy number.Nat Rev Genet. 2009; 10: 551-564Crossref PubMed Scopus (777) Google Scholar,35Sen S.K. Han K. Wang J. Lee J. Wang H. Callinan P.A. Dyer M. Cordaux R. Liang P. Batzer M.A. Human genomic deletions mediated by recombination between Alu elements.Am J Hum Genet. 2006; 79: 41-53Abstract Full Text Full Text PDF PubMed Scopus (221) Google Scholar this mechanism can also create duplications.32Song X. Beck C.R. Du R. Campbell I.M. Coban-Akdemir Z. Gu S. Breman A.M. Stankiewicz P. Ira G. Shaw C.A. Lupski J.R. Predicting human genes susceptible to genomic instability associated with Alu/Alu-mediated rearrangements.Genome Res. 2018; 28: 1228-1242Crossref PubMed Scopus (35) Google Scholar,36Reams A.B. Roth J.R. Mechanisms of gene duplication and amplification.Cold Spring Harb Perspect Biol. 2015; 7: a016592Crossref PubMed Scopus (104) Google Scholar Microhomology-mediated end joining (MMEJ) uses short stretches of shared sequence (typically 4 to 6 bp, up to 10 bp).34Chang H.H.Y. Pannunzio N.R. Adachi N. Lieber M.R. Non-homologous DNA end joining and alternative pathways to double-strand break repair.Nat Rev Mol Cell Biol. 2017; 18: 495-506Crossref PubMed Scopus (564) Google Scholar,37Ottaviani D. LeCain M. Sheer D. The role of microhomology in genomic structural variation.Trends Genet. 2014; 30: 85-94Abstract Full Text Full Text PDF PubMed Scopus (96) Google Scholar Serial replication slippage introduces one or more sequence repeats of variable length, which typically matches in sequence with one of the breakpoint regions, either in normal or inverted orientation.38Chen J.-M. Chuzhanova N. Stenson P.D. Férec C. Cooper D.N. Complex gene rearrangements caused by serial replication slippage.Hum Mutat. 2005; 26: 125-134Crossref PubMed Scopus (79) Google Scholar,39Ohye T. Inagaki H. Ozaki M. Ikeda T. Kurahashi H. Signature of backward replication slippage at the copy number variation junction.J Hum Genet. 2014; 59: 247-250Crossref PubMed Scopus (3) Google Scholar The distance from the slippage event to the breakpoint junction of the gross rearrangement has been shown to vary from 0 to 41 bp.40Carvalho C.M.B. Pehlivan D. Ramocki M.B. Fang P. Alleva B. Franco L.M. Belmont J.W. Hastings P.J. Lupski J.R. Replicative mechanisms for CNV formation are error prone.Nat Genet. 2013; 45: 1319-1326Crossref PubMed Scopus (90) Google Scholar The absence of sequence similarity, typically defined as a shared sequence of 0 to 1 bp up to at most 4 bp, is a hallmark of nonhomologous end joining (NHEJ), especially when combined with a molecular scar resulting from small deletions or insertions at the CNV breakpoints.31Conrad D.F. Bird C. Blackburne B. Lindsay S. Mamanova L. Lee C. Turner D.J. Hurles M.E. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs.Nat Genet. 2010; 42: 385-391Crossref PubMed Scopus (180) Google Scholar,34Chang H.H.Y. Pannunzio N.R. Adachi N. Lieber M.R. Non-homologous DNA end joining and alternative pathways to double-strand break repair.Nat Rev Mol Cell Biol. 2017; 18: 495-506Crossref PubMed Scopus (564) Google Scholar Complex rearrangements are believed to result from mechanistic models such as fork stalling and template switching or microhomology-mediated break-induced replication.7Brand H. Collins R.L. Hanscom C. Rosenfeld J.A. Pillalamarri V. Stone M.R. Kelley F. Mason T. Margolin L. Eggert S. Mitchell E. Hodge J.C. Gusella J.F. Sanders S.J. Talkowski M.E. Paired-duplication signatures mark cryptic inversions and other complex structural variation.Am J Hum Genet. 2015; 97: 170-176Abstract Full Text Full Text PDF PubMed Scopus (29) Google Scholar,30Hastings P.J. Lupski J.R. Rosenberg S.M. Ira G. Mechanisms of change in gene copy number.Nat Rev Genet. 2009; 10: 551-564Crossref PubMed Scopus (777) Google Scholar,31Conrad D.F. Bird C. Blackburne B. Lindsay S. Mamanova L. Lee C. Turner D.J. Hurles M.E. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs.Nat Genet. 2010; 42: 385-391Crossref PubMed Scopus (18

Referência(s)