Artigo Acesso aberto Revisado por pares

Clinical Validation of Tagmentation-Based Genome Sequencing for Germline Disorders

2023; Elsevier BV; Volume: 25; Issue: 7 Linguagem: Inglês

10.1016/j.jmoldx.2023.04.001

ISSN

1943-7811

Autores

Wei Shen, Heidi L. Sellers, Lauren A. Choate, Mariam I. Stein, Pratyush P. Tandale, Jiayu Tan, Rohit Setlem, Yuta Sakai, Numrah Fadra, Carlos Sosa, Shawn McClelland, Sarah Barnett, Kristen Rasmussen, Cassandra Runke, Stephanie A. Smoley, Lori S. Tillmans, Cherisse A. Marcou, Ross Rowsey, Erik C. Thorland, Nicole J. Boczek, Hutton M. Kearney,

Tópico(s)

Chromosomal and Genetic Variations

Resumo

Genome sequencing (GS) is a powerful clinical tool used for the comprehensive diagnosis of germline disorders. GS library preparation typically involves mechanical DNA fragmentation, end repair, and bead-based library size selection followed by adapter ligation, which can require a large amount of input genomic DNA. Tagmentation using bead-linked transposomes can simplify the library preparation process and reduce the DNA input requirement. Here we describe the clinical validation of tagmentation-based PCR-free GS as a clinical test for rare germline disorders. Compared with the Genome-in-a-Bottle Consortium benchmark variant sets, GS had a recall >99.7% and a precision of 99.8% for single nucleotide variants and small insertion-deletions. GS also exhibited 100% sensitivity for clinically reported sequence variants and the copy number variants examined. Furthermore, GS detected mitochondrial sequence variants above 5% heteroplasmy and showed reliable detection of disease-relevant repeat expansions and SMN1 homozygous loss. Our results indicate that while lowering DNA input requirements and reducing library preparation time, GS enables uniform coverage across the genome as well as robust detection of various types of genetic alterations. With the advantage of comprehensive profiling of multiple types of genetic alterations, GS is positioned as an ideal first-tier diagnostic test for germline disorders. Genome sequencing (GS) is a powerful clinical tool used for the comprehensive diagnosis of germline disorders. GS library preparation typically involves mechanical DNA fragmentation, end repair, and bead-based library size selection followed by adapter ligation, which can require a large amount of input genomic DNA. Tagmentation using bead-linked transposomes can simplify the library preparation process and reduce the DNA input requirement. Here we describe the clinical validation of tagmentation-based PCR-free GS as a clinical test for rare germline disorders. Compared with the Genome-in-a-Bottle Consortium benchmark variant sets, GS had a recall >99.7% and a precision of 99.8% for single nucleotide variants and small insertion-deletions. GS also exhibited 100% sensitivity for clinically reported sequence variants and the copy number variants examined. Furthermore, GS detected mitochondrial sequence variants above 5% heteroplasmy and showed reliable detection of disease-relevant repeat expansions and SMN1 homozygous loss. Our results indicate that while lowering DNA input requirements and reducing library preparation time, GS enables uniform coverage across the genome as well as robust detection of various types of genetic alterations. With the advantage of comprehensive profiling of multiple types of genetic alterations, GS is positioned as an ideal first-tier diagnostic test for germline disorders. Rare diseases collectively affect 5% to 8% of the population, and it is estimated that between 40% and 70% of the approximately 10,000 known rare diseases have a conclusive genetic etiology.1Ferreira C.R. The burden of rare diseases.Am J Med Genet A. 2019; 179: 885-892Crossref PubMed Scopus (132) Google Scholar,2Nguengang Wakap S. Lambert D.M. Olry A. Rodwell C. Gueydan C. Lanneau V. Murphy D. Le Cam Y. Rath A. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database.Eur J Hum Genet. 2020; 28: 165-173Crossref PubMed Scopus (444) Google Scholar The application of next-generation sequencing (NGS), including exome sequencing (ES) and genome sequencing (GS), has accelerated the discovery of genes associated with Mendelian conditions3Bamshad M.J. Nickerson D.A. Chong J.X. Mendelian gene discovery: fast and furious with no end in sight.Am J Hum Genet. 2019; 105: 448-455Abstract Full Text Full Text PDF PubMed Scopus (104) Google Scholar and transformed clinical diagnostic potential. With the ability to detect all types of genetic alterations, including sequence variants, structural variants, and mitochondrial variants, among others, GS enables the most comprehensive genetic testing for germline disorders. Multiple studies have shown the improved diagnostic power of GS over targeted NGS assays, ES, or chromosomal microarrays (CMAs)4Belkadi A. Bolze A. Itan Y. Cobat A. Vincent Q.B. Antipenko A. Shang L. Boisson B. Casanova J.-L. Abel L. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.Proc Natl Acad Sci U S A. 2015; 112: 5473-5478Crossref PubMed Scopus (374) Google Scholar, 5Bertoli-Avella A.M. Beetz C. Ameziane N. Rocha M.E. Guatibonza P. Pereira C. et al.Successful application of genome sequencing in a diagnostic setting: 1007 index cases from a clinically heterogeneous cohort.Eur J Hum Genet. 2021; 29: 141-153Crossref PubMed Scopus (36) Google Scholar, 6Lionel A.C. Costain G. Monfared N. Walker S. Reuter M.S. Hosseini S.M. et al.Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test.Genet Med. 2018; 20: 435-443Abstract Full Text Full Text PDF PubMed Scopus (309) Google Scholar, 7Rajagopalan R. Gilbert M.A. McEldrew D.A. Nassur J.A. Loomes K.M. Piccoli D.A. Krantz I.D. Conlin L.K. Spinner N.B. Genome sequencing increases diagnostic yield in clinically diagnosed Alagille syndrome patients with previously negative test results.Genet Med. 2021; 23: 323-330Abstract Full Text Full Text PDF PubMed Scopus (11) Google Scholar and the utility of GS to streamline diagnosis and discover novel etiologies in patients with rare diseases.8Smedley D. Smith K.R. Martin A. Thomas E.A. McDonagh E.M. et al.1000 Genomes Project Pilot Investigators100,000 Genomes Pilot on rare-disease diagnosis in health care—preliminary report.N Engl J Med. 2021; 385: 1868-1880Crossref PubMed Scopus (151) Google Scholar, 9Turro E. Astle W.J. Megy K. Gräf S. Greene D. Shamardina O. et al.Whole-genome sequencing of patients with rare diseases in a national health system.Nature. 2020; 583: 96-102Crossref PubMed Scopus (225) Google Scholar, 10Krantz I.D. Medne L. Weatherly J.M. Wild K.T. Biswas S. et al.NICUSeq Study GroupEffect of whole-genome sequencing on the clinical management of acutely ill infants with suspected genetic disease: a randomized clinical trial.JAMA Pediatr. 2021; 175: 1218-1226Crossref PubMed Scopus (43) Google Scholar Accordingly, GS is now supported by the American College of Medical Genetics and Genomics as a first-tier diagnostic test in the pediatric setting.11Manickam K. McClain M.R. Demmer L.A. Biswas S. Kearney H.M. Malinowski J. Massingham L.J. Miller D. Yu T.W. Hisama F.M. ACMG Board of DirectorsExome and genome sequencing for pediatric patients with congenital anomalies or intellectual disability: an evidence-based clinical guideline of the American College of Medical Genetics and Genomics (ACMG).Genet Med. 2021; 23: 2029-2037Abstract Full Text Full Text PDF PubMed Scopus (126) Google Scholar Sequencing technologies have evolved rapidly in the past few years. Standard PCR-free library preparation for GS typically uses a mechanical force to fragment DNA, which is followed by end repair, bead-based library size selection, and adapter ligation. This process typically takes several hours and requires a significant amount of input genomic DNA (>1 μg). Tagmentation using a bead-linked transposome complex combines DNA fragmentation, normalization, and sequencing adapter ligation into a single reaction step, thus simplifying library preparation and shortening processing time compared with traditional methods.12Bruinsma S. Burgess J. Schlingman D. Czyz A. Morrell N. Ballenger C. Meinholz H. Brady L. Khanna A. Freeberg L. Jackson R.G. Mathonet P. Verity S.C. Slatter A.F. Golshani R. Grunenwald H. Schroth G.P. Gormley N.A. Bead-linked transposomes enable a normalization-free workflow for NGS library preparation.BMC Genomics. 2018; 19: 722Crossref PubMed Scopus (26) Google Scholar The distance between the on-bead transposomes also generates more consistent insert sizes compared with fragmentation. Saturated with input DNA, the tagmentation approach also allows flexibility to handle a wide range of DNA input while normalizing libraries. Given these advantages, our laboratory has implemented tagmentation-based, PCR-free GS as a clinical test for rare germline disorders. The analytical performance for various types of genetic alterations was evaluated, including single nucleotide variants (SNVs) and small insertion-deletions (indels) (1 to 50 bp), copy number variants (CNVs), repeat expansions, mitochondrial sequence variants, and SMN1 loss. The current article summarizes our experience with clinical validation of GS. Genomic DNA was extracted from whole blood samples by using PureGene (Qiagen, Germantown, MD) or Chemagic chemistry (PerkinElmer, Waltham, MA). A set of 117 DNA samples were used in the validation: four National Institute of Standards and Technology reference genome samples and 113 patient samples that underwent previous clinical testing, including 14 ES samples, 45 CMA samples, 31 repeat expansion testing samples, 15 mitochondrial genome testing samples, and 8 SMN1 deletion testing samples. In addition, three pairs of DNA samples were extracted from saliva and paired peripheral blood and sequenced to compare the two sample types. Tagmentation-based library preparation was performed by using 300 to 500 ng genomic DNA and the Illumina DNA PCR-Free Prep kit, and then sequenced using NovaSeq 6000 instruments (Illumina, San Diego, CA). Sequencing data were analyzed with Illumina's DRAGEN pipeline (version 3.8.4) using a graph GRCh38 reference genome. The DRAGEN small variant caller was used to detect SNVs, small indels, and mitochondrial genome sequence variants. CNVs were identified by using the DRAGEN CNV caller. The DRAGEN structural variant (SV) caller integrates Manta to detect large indels (>50 bp).13Chen X. Schulz-Trieglaff O. Shaw R. Barnes B. Schlesinger F. Källberg M. Cox A.J. Kruglyak S. Saunders C.T. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.Bioinformatics. 2016; 32: 1220-1222Crossref PubMed Scopus (842) Google Scholar DRAGEN also includes ExpansionHunter to detect repeat expansions.14Dolzhenko E. Deshpande V. Schlesinger F. Krusche P. Petrovski R. Chen S. Emig-Agius D. Gross A. Narzisi G. Bowman B. Scheffler K. van Vugt J.J.F.A. French C. Sanchis-Juan A. Ibáñez K. Tucci A. Lajoie B.R. Veldink J.H. Raymond F.L. Taft R.J. Bentley D.R. Eberle M.A. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions.Bioinformatics. 2019; 35: 4754-4756Crossref PubMed Scopus (89) Google Scholar,15Dolzhenko E. van Vugt J.J.F.A. Shaw R.J. Bekritsky M.A. van Blitterswijk M. Narzisi G. et al.Detection of long repeat expansions from PCR-free whole-genome sequence data.Genome Res. 2017; 27: 1895-1903Crossref PubMed Scopus (169) Google Scholar Results of the DRAGEN SV caller were used as supporting CNV calls to define the breakpoints. Homozygous loss of SMN1 exon 7 was detected based on absence of the C allele at chr5:70951946(GRCh38). (Balanced SVs were out of scope for the current study.) Genome was stratified using version 3.0 stratification BED files developed by the Global Alliance for Genomics and Health Benchmarking Team and the Genome-in-a-Bottle Consortium (GIAB).16Krusche P. Trigg L. Boutros P.C. Mason C.E. De La Vega F.M. Moore B.L. Gonzalez-Porta M. Eberle M.A. Tezak Z. Lababidi S. Truty R. Asimenos G. Funke B. Fleharty M. Chapman B.A. Salit M. Zook J.M. Global Alliance for Genomics and Health Benchmarking TeamBest practices for benchmarking germline small-variant calls in human genomes.Nat Biotechnol. 2019; 37: 555-560Crossref PubMed Scopus (127) Google Scholar,17Zook J. Genome In A Bottle—v3.0 Genome Stratifications. National Institute of Standards and Technology.2021Google Scholar GIAB benchmark variant sets version 4.2.118Wagner J. Olson N.D. Harris L. Khan Z. Farek J. Mahmoud M. et al.Benchmarking challenging small variants with linked and long reads.Cell Genom. 2022; 2100128Google Scholar (n = 4) and findings from previous genetic testing of 113 patient samples, including ES, NGS panels, CMA, and others, were compared versus the GS results. In the comparison of GS versus ES, ES data were assumed to be the "ground truth" for the comparison analysis. To reduce confounding false calls in the ES data, the comparison was restricted to "PASS" variants in high-confidence regions, which included only regions with at least 30× coverage among the ES samples and overlap with not-difficult genomic regions defined by the GIAB version 3.0 stratification notinalldifficultregions bed. Difficult exome regions were also investigated for the purpose of performance evaluation. The available ES variant data used GRCh37 genome build, and thus they were lifted over to GRCh38 using the LiftoverVcf tool (Picard tools version 2.22.3; Broad Institute, Cambridge, MA) before comparing them versus GS data. To reduce discrepancy due to differences in the genome builds, ES indel variants due to GRCh37 errors (eg, missing base pairs) were excluded from comparison. CNVs were detected by using the DRAGEN CNV caller. The genome was separated into target interval levels (bins of approximately 1000 bp). Read coverage of each bin was counted, normalized, and used in segmentation to call CNVs. The reproducibility of CNVs was assessed at the bin level among replicates. Bins with consistent copy number status called among all three replicates were considered concordant. To evaluate the concordance between GS and CMA, the fractional bin concordance was calculated for each preselected high-confidence CNV identified by CMA (CytoScan HD, Thermo Fisher Scientific, Waltham, MA), which was defined as the ratio between the number of bins with concordant copy number status (loss, gain, or normal) called by GS and the total number of bins within each genomic region defined by the CMA CNV. Values >75% were considered concordant. CNVs were also intersected with the DRAGEN SV caller results to identify CNVs with supporting CNV calls. A total of 31,759 indel variants of various size ranges were selected from ClinVar and HGMD databases as the ground truth set, using a 5 kb distance buffer among the variants. In silico FASTQ files were generated for the five samples with NEAT version 3.0,19Stephens Z.D. Hudson M.E. Mainzer L.S. Taschuk M. Weber M.R. Iyer R.K. Simulating next-generation sequencing datasets from empirical mutation and sequencing models.PLoS One. 2016; 11e0167047Crossref Scopus (35) Google Scholar inserting the selected variants for each sample into the reads and using the following options to mimic GS metrics:•Coverage = 30וRead length = 151 bp•Fragment length = 409; SD = 90•Error rate = 0.0023•Mutation rate = 0 Each FASTQ file was processed through the DRAGEN pipeline. Indel variants were binned into the following size ranges: 1 to 15, 16 to 50, 51 to 100, 101 to 200, 201 to 1000, and 1001 to 10,000 bp. Recall and precision were calculated for each size range against the ground truth set. The study protocol was approved by the Mayo Clinic Institutional Review Board (protocol number 15-007359). Four DNA input levels (100 ng, 200 ng, 300 ng, and 500 ng) were examined to determine the minimum amount of DNA required for the standard GS library preparation workflow (Figure 1). The analysis showed that 300 ng input DNA is the saturation point of library preparation, in which the average autosomal coverage, final library yield, insert median size, and total number of variants plateau (Figure 2). This is consistent with the manufacturer's recommendation for this chemistry. Using data from 500 ng input as the comparison baseline, the recall and precision of SNVs and small indels were comparable among 100 ng to 300 ng inputs, with a subtle reduction of variant detection performance observed for lower DNA inputs ( 20× (Table 1). On average, GS detected 4,111,405 SNVs, 986,793 small indels, and 692 CNVs per sample. When stratified for difficult versus not-difficult genome regions defined by GIAB version 3.0 Genome Stratifications, the overall interassay and intra-assay reproducibilities for SNVs were much higher in the not-difficult genomic regions compared with the difficult regions (99.81% vs 80.69%), as expected. The overall reproducibility of indels decreases as the size increases (Supplemental Tables S2 and S3). In addition, for the well-characterized GIAB sample HG002 in which the ground truth data are available, the reproducibility for variants in the high-confidence regions was examined. The reproducibilities for SNVs and indels were above 99.7% and 95.2%, respectively (Supplemental Tables S4 and S5). Reproducibility of CNVs was analyzed at a target interval level (approximately 1000 bp bins) and exhibited >99.9% reproducibility (Supplemental Table S6).Table 1Sequencing Quality Control MetricsQuality control metricAverage%CVRecommended acceptable thresholdTotal no. of reads1,391,553,10819.5>694,000,000Average autosomal coverage over GENOME62.719.1>32×Average autosomal coverage over EXOME62.925.0>32×% of GENOME with >20× coverage95.50.8>90% of EXOME with >20× coverage98.010.6>90Normalized coverage at GCs 60–79 (GENOME)1.06.3>0.9Uniformity of coverage (GENOME), %96.30.3>95Uniformity of coverage (EXOME), %99.80.1>99Estimated sample contamination0.0017.4 85Median insert size, bp388.04.3>300% Mapped reads99.70.1>98% Duplicated mapped reads7.019.1 95% Reads with MAPQ ≥2092.20.5>90% Autosome callability97.60.2>95Chimeric %2.52.2<3 Open table in a new tab To examine whether the standard GS workflow can be applied to saliva specimens, DNA extracted from paired saliva and peripheral blood specimens obtained from three individuals was sequenced. Saliva specimens exhibited quality control metrics comparable overall to those of blood specimens (Supplemental Table S7). However, saliva specimens had a significantly higher average mitochondrial genome coverage, which suggests a higher mitochondria abundance in saliva. Saliva specimens also had a slightly higher percentage of unmapped reads, which likely reflects the presence of microbes (Supplemental Table S7). Nonetheless, GS generated equivalent variant data from saliva and blood samples, showing that saliva specimens are an acceptable specimen type for the clinical GS test (Supplemental Table S8). To compare variant detection by GS to the GIAB benchmark variant sets, four GIAB samples (HG001, HG002, HG003, and HG004) were sequenced. The overall recall and precision in the not-difficult regions were above 99.5% for SNVs and small indels. The performance in the difficult regions decreased slightly (Table 2). Results for each individual GIAB sample are shown in Supplemental Table S9. Compared with ES, GS also achieved high concordance for SNVs with a 97.0% precision and a 99.9% recall. The concordance for small indels (<20 bp) between GS and ES was also high, with a 98.5% recall and 94.8% precision (Table 3). Differences in bioinformatics pipelines, genome reference builds, and sequencing chemistries likely contributed to the discordant variants. All clinically reported variants (n = 34) in the ES samples were detected by GS (Supplemental Table S10).Table 2Summarized Performance of Sequence Variant Detection for GIAB Benchmark VariantsStratificationsSNVDeletionInsertion1–20 bp21–40 bp41–50 bp≥51 bp1–20 bp21–40 bp41–50 bp≥51 bpDifficult FN37,126262218722122301642510 FP18,78419773734153234912 TP2,394,890779,46764664543718,02645103201 Recall0.98470.99660.97190.95380.750.99690.96490.92750.0909 Precision0.99220.99750.99430.99340.42860.99790.99250.97260.0769Not-difficult FN381798400111312 FP408012810281015 TP11,000,000308,74832142391307,41738443021 Recall0.99960.99970.9988110.99960.99920.99670.3333 Precision0.99960.99960.999710.33330.999710.99670.1667Overall FN40,943272019122123411672612 FP22,864210538361613341017 TP13,000,0001,088,215968069341,025,44383546222 Recall0.99690.99750.98070.96920.80.99770.98040.95990.1429 Precision0.99830.99810.99610.99570.40.99840.99590.98420.1053GIAB, Genome-in-a-Bottle; SNV, single nucleotide variant. Open table in a new tab Table 3Concordance between GS and ESVariant typeES-only variantGS-only variantConcordantRecallPrecisionSNV2004991161,2910.99880.9700Deletion 1–20 bp69620220.99700.9547 21–40 bp092210.7097 41–50 bp00111 ≥51 bp1000NAInsertion 1–20 bp5411418170.97110.9410 21–40 bp0341610.3200 41–50 bp081110.5789 ≥51 bp23010.33330.0323Only "PASS" variants were included in the comparison.ES, exome sequencing; GS, genomic sequencing; NA, not applicable; SNV, single nucleotide variant. Open table in a new tab GIAB, Genome-in-a-Bottle; SNV, single nucleotide variant. Only "PASS" variants were included in the comparison. ES, exome sequencing; GS, genomic sequencing; NA, not applicable; SNV, single nucleotide variant. To evaluate the reliability and performance of CNV detection using GS, 45 patient samples previously analyzed by CMA were sequenced. The performance of GS for 142 high-confidence nonmosaic CNVs (70 gains, 72 losses) identified by CMA, ranging from 4.8 kb to 31.1 Mb in size, was evaluated. All of these high-confidence CNVs were identified by GS with at least 75% overlap of the genomic regions between CMA and GS calls (Supplemental Table S11). Among those, all 53 clinically reported CNVs were detected with at least 85% reciprocal overlap (average 98.9%). Supporting CNV calls were made by DRAGEN SV caller for 57% (81 of 142) of the CNVs, which helped refine the breakpoints or further characterize the CNVs. Split reads and discordant reads across breakpoint junctions were identified and fine mapped all 47 deletions to base pair level. All 34 duplications were confirmed as direct tandem duplications. Of the 62 CNVs without supporting CNV calls, 26% (n = 16) had segmental duplications at both event boundaries (compared versus 0 with supporting CNV calls) and 21% (n = 13) had segmental duplications at one boundary [compared with eight (10%) with supporting CNV calls]. These findings suggest that it is challenging to call breakpoints in the presence of a segmental duplication using split or discordant reads, which likely contributed to the missed supporting CNV calls. Fourteen mosaic CNVs identified by CMA were also included to evaluate the GS performance. The DRAGEN germline CNV caller detected five of the 14 mosaic CNVs. Eight of the mosaic CNVs not detected by DRAGEN were easily manually detected upon visual inspection of the log2 ratio plot (which is our standard analysis procedure for GS). The remaining one mosaic CNV not detected by DRAGEN or manual inspection was part of a complex chromosome 18 rearrangement ( 50 bp were detected with reduced sensitivities (Supplemental Table S15). The current study shows that the bead-linked transposome-based tagmentation technology enables streamlined PCR-free library preparation for clinical GS. DNA input and processing time were significantly reduced compared with mechanical fragmentation–based approaches, and GS achieved uniform coverage across the genome with robust detection of small variants and CNVs. Consistent with previous studies and the manufacturer's recommendation,12Bruinsma S. Burgess J. Schlingman D. Czyz A. Morrell N. Ballenger C. Meinholz H. Brady L. Khanna A. Freeberg L. Jackson R.G. Mathonet P. Verity S.C. Slatter A.F. Golshani R. Grunenwald H. Schroth G.P. Gormley N.A. Bead-linked transposomes enable a normalization-free workflow for NGS library preparation.BMC Genomics. 2018; 19: 722Crossref PubMed Scopus (26) Google Scholar it was found that the bead-linked transposome–based tagmentation was saturated with genomic DNA inputs above 300 ng, effectively normalizing the sequencing library yield and allowing equal volume pooling of samples before sequencing, thus simplifying the preparation workflow. To assess the accuracy of variant calls made by the GS pipeline, the results were compared to calls made by ES, CMA, repeat-primed PCR, mitochondrial sequencing, and droplet digital PCR. High concordance was observed for small sequence variants, mitochondrial sequence variants, and CNVs, showing the comprehensiveness of GS as a clinical test. GS reliably calls CNVs that were clinically reported in orthogonal assays such as CMA. In the subset of CNVs with supporting CNV calls made by DRAGEN SV caller, CNV breakpoints were fine mapped to the base pair level, providing additional information not available with CMA. Although detectable, balanced structural rearrangements are still challenging to identify as part of a routine variant review and interpretation process due to the large numbers of calls made by DRAGEN SV caller. Mosaic CNV events of varying sizes and mosaic percentages can occasionally be called using DRAGEN, but additional manual analysis (visual inspection) is necessary to maximize the sensitivity of detection for large mosaic events. Although GS can accurately estimate repeat expansions up to 150 bp, large expansions (>150 bp) remain challenging to accurately size by short-read sequencing. The ExpansionHunter algorithm also had difficulty with mosaic findings. For example, in two male patients mosaic for FMR1 repeat sizes in both the full mutation and normal ranges, a single premutation allele was called. Overall, GS can be used as a sensitive flag for a potentially abnormal repeat expansion result, but we recommend that clinically reportable expanded alleles be orthogonally confirmed and accurately sized by validated clinical assays before returning the results to patients. GS is an overall robust assay, but there are limitations of the bioinformatics analysis that require future improvement. First, large indel detection has a much lower rate of reproducibility across three replicates compared with SNVs and small indels (Supplemental Table S16). The assessment of accuracy of large indel calls was significantly hindered by the lack of comprehensive benchmark variant sets using GRCh38 build. Thus, a set of in silico FASTQ files were simulated to test the robustness of large indel calling. Although the DRAGEN SV caller detected most of the simulated deletions and smaller insertions and duplications accurately, insertions/duplications between 50 and 1000 bp were more challenging to detect. Further investigation is needed to improve the performance of large indel detection using GS data and to prioritize which indels should be reported clinically. In addition, although mitochondrial genome sequence variants were accurately called, the DRAGEN pipeline used in this study does not detect large deletions/duplications in the mitochondrial genome. Furthermore, the DRAGEN version in use does not determine the copy numbers for either SMN1 or SMN2. The newest DRAGEN pipeline (version 4.0.3) has enhanced features, including reporting SMN1 and SMN2 copy numbers. Other methods to determine SMN1 status based on NGS data have been developed.22Feng Y. Ge X. Meng L. Scull J. Li J. Tian X. Zhang T. Jin W. Cheng H. Wang X. Tokita M. Liu P. Mei H. Wang Y. Li F. Schmitt E.S. Zhang W.V. Muzny D. Wen S. Chen Z. Yang Y. Beaudet A.L. Liu X. Eng C.M. Xia F. Wong L.-J. Zhang J. The next generation of population-based spinal muscular atrophy carrier screening: comprehensive pan-ethnic SMN1 copy-number and sequence variant analysis by massively parallel sequencing.Genet Med. 2017; 19: 936-944Abstract Full Text Full Text PDF PubMed Scopus (53) Google Scholar The performance of these methods using GS data has yet to be investigated. Improvements to the bioinformatics pipeline to address these limitations will be implemented in an updated version of the GS clinical test. In summary, this study establishes at least equivalent, if not superior, performance compared with standard ES and CMA approaches. As validated in our laboratory, we consider GS to be a superior replacement technology for ES and CMA. In addition, we documented promising performance for additional variant classes (eg, mitochondrial sequence variants, repeat expansions, SMN1 loss). These additional variant classes should be reviewed and orthogonally confirmed before reporting by GS, because the GS performance has not yet been shown to be equivalent to gold standard assays. With the advantage of comprehensive profiling of multiple types of genetic alterations, and the documented robust analytical performance, GS is positioned as an ideal first-tier diagnostic test for germline disorders. Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S1 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S2 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S3 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S4 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S5 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S6 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S7 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S8 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S9 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S10 Download .xlsx (.02 MB) Help with xlsx files Supplemental Table S11 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S12 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S13 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S14 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S15 Download .xlsx (.01 MB) Help with xlsx files Supplemental Table S16

Referência(s)