Artigo Acesso aberto Revisado por pares

Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated Genes

2014; Elsevier BV; Volume: 17; Issue: 1 Linguagem: Inglês

10.1016/j.jmoldx.2014.09.006

ISSN

1943-7811

Autores

Anna Gréen, Henrik Gréen, Malin Rehnberg, Anneli Svensson, Cecilia Gunnarsson, Jon Jonasson,

Tópico(s)

Sports injuries and prevention

Resumo

The genetic basis of arrhythmogenic right ventricular cardiomyopathy (ARVC) is complex. Mutations in genes encoding components of the cardiac desmosomes have been implicated as being causally related to ARVC. Next-generation sequencing allows parallel sequencing and duplication/deletion analysis of many genes simultaneously, which is appropriate for screening of mutations in disorders with heterogeneous genetic backgrounds. We designed and validated a next-generation sequencing test panel for ARVC using HaloPlex. We used SureDesign to prepare a HaloPlex enrichment system for sequencing of DES, DSC2, DSG2, DSP, JUP, PKP2, RYR2, TGFB3, TMEM43, and TTN from patients with ARVC using a MiSeq instrument. Performance characteristics were determined by comparison with Sanger, as the gold standard, and TruSeq Custom Amplicon sequencing of DSC2, DSG2, DSP, JUP, and PKP2. All the samples were successfully sequenced after HaloPlex capture, with >99% of targeted nucleotides covered by >20×. The sequences were of high quality, although one problematic area due to a presumptive context-specific sequencing error–causing motif located in exon 1 of the DSP gene was detected. The mutations found by Sanger sequencing were also found using the HaloPlex technique. Depending on the bioinformatics pipeline, sensitivity varied from 99.3% to 100%, and specificity varied from 99.9% to 100%. Three variant positions found by Sanger and HaloPlex sequencing were missed by TruSeq Custom Amplicon owing to loss of coverage. The genetic basis of arrhythmogenic right ventricular cardiomyopathy (ARVC) is complex. Mutations in genes encoding components of the cardiac desmosomes have been implicated as being causally related to ARVC. Next-generation sequencing allows parallel sequencing and duplication/deletion analysis of many genes simultaneously, which is appropriate for screening of mutations in disorders with heterogeneous genetic backgrounds. We designed and validated a next-generation sequencing test panel for ARVC using HaloPlex. We used SureDesign to prepare a HaloPlex enrichment system for sequencing of DES, DSC2, DSG2, DSP, JUP, PKP2, RYR2, TGFB3, TMEM43, and TTN from patients with ARVC using a MiSeq instrument. Performance characteristics were determined by comparison with Sanger, as the gold standard, and TruSeq Custom Amplicon sequencing of DSC2, DSG2, DSP, JUP, and PKP2. All the samples were successfully sequenced after HaloPlex capture, with >99% of targeted nucleotides covered by >20×. The sequences were of high quality, although one problematic area due to a presumptive context-specific sequencing error–causing motif located in exon 1 of the DSP gene was detected. The mutations found by Sanger sequencing were also found using the HaloPlex technique. Depending on the bioinformatics pipeline, sensitivity varied from 99.3% to 100%, and specificity varied from 99.9% to 100%. Three variant positions found by Sanger and HaloPlex sequencing were missed by TruSeq Custom Amplicon owing to loss of coverage. Arrhythmogenic right ventricular cardiomyopathy (ARVC), a heart disease characterized by the replacement of myocytes by adipose and fibrous tissue, may lead to heart failure, arrhythmias, and, in some cases, sudden cardiac death. The prevalence of ARVC is estimated to range from 1:2000 to 1:5000 in the general population.1Norman M.W. McKenna W.J. Arrhythmogenic right ventricular cardiomyopathy: perspectives on disease.Z Kardiol. 1999; 88: 550-554Crossref PubMed Scopus (33) Google Scholar ARVC is usually inherited as an autosomal dominant disease with reduced penetrance and variable expression, although autosomal recessive inheritance,2Laurent M. Descaves C. Biron Y. Deplace C. Almange C. Daubert J.C. Familial form of arrhythmogenic right ventricular dysplasia.Am Heart J. 1987; 113: 827-829Abstract Full Text PDF PubMed Scopus (51) Google Scholar, 3Nava A. Thiene G. Canciani B. Scognamiglio R. Daliento L. Buja G. Martini B. Stritoni P. Fasoli G. Familial occurrence of right ventricular dysplasia: a study involving nine families.J Am Coll Cardiol. 1988; 12: 1222-1228Abstract Full Text PDF PubMed Scopus (320) Google Scholar, 4Protonotarios N. Tsatsopoulou A. Patsourakos P. Alexopoulos D. Gezerlis P. Simitsis S. Scampardonis G. Cardiac abnormalities in familial palmoplantar keratosis.Br Heart J. 1986; 56: 321-326Crossref PubMed Scopus (202) Google Scholar including compound heterozygosity and digenic mutations, has also been described.5Xu T. Yang Z. Vatta M. Rampazzo A. Beffagna G. Pilichou K. Scherer S.E. Saffitz J. Kravitz J. Zareba W. Danieli G.A. Lorenzon A. Nava A. Bauce B. Thiene G. Basso C. Calkins H. Gear K. Marcus F. Towbin J.A. Compound and digenic heterozygosity contributes to arrhythmogenic right ventricular cardiomyopathy.J Am Coll Cardiol. 2010; 55: 587-597Abstract Full Text Full Text PDF PubMed Scopus (250) Google Scholar Mutations in genes encoding components of the cardiac desmosome have been implicated as being causally related to ARVC. Defects in the desmosomal cell adhesion protein junction plakoglobin (JUP), desmoplakin (DSP), plakophilin-2 (PKP2), and desmoglein-2 (DSG2) may cause impaired function of the desmosomes owing to loss of electrical coupling between cardiac myocytes, leading to myocyte cell death and arrhythmias.6Sen-Chowdhry S. Syrris P. McKenna W.J. Genetics of right ventricular cardiomyopathy.J Cardiovasc Electrophysiol. 2005; 16: 927-935Crossref PubMed Scopus (152) Google Scholar At least 10 genes—DES, DSC2, DSG2, DSP, JUP, PKP2, RYR2, TGFB3, TMEM43, and TTN—involved in the function of desmosomes have been found to be important for ARVC development (Table 1). Additional genes associated with autosomal dominant ARVC have been mapped but not identified.7McNally E, MacLeod H, Dellefave-Castillo L: Arrhythmogenic Right Ventricular Dysplasia/Cardiomyopathy. Edited by RA Pagon, MP Adam, HH Ardinger, et al, In GeneReviews [Internet]. Copyright University of Washington, Seattle. 1993-2014 Available at http://www.ncbi.nlm.nih.gov/books/NBK1131, last revised January 9, 2014Google Scholar Point mutations located in protein-coding sequences of PKP2, DSG2, DSP, DSC2, and JUP dominate the spectrum of pathogenic mutations and are found by sequencing in approximately 40% of those with a clinical diagnosis of ARVC.8Bauce B. Nava A. Beffagna G. Basso C. Lorenzon A. Smaniotto G. De Bortoli M. Rigato I. Mazzotti E. Steriotis A. Marra M.P. Towbin J.A. Thiene G. Danieli G.A. Rampazzo A. Multiple mutations in desmosomal proteins encoding genes in arrhythmogenic right ventricular cardiomyopathy/dysplasia.Heart Rhythm. 2010; 7: 22-29Abstract Full Text Full Text PDF PubMed Scopus (146) Google Scholar Mutations in the extraordinarily large genes RYR29Tiso N. Stephan D.A. Nava A. Bagattin A. Devaney J.M. Stanchi F. Larderet G. Brahmbhatt B. Brown K. Bauce B. Muriago M. Basso C. Thiene G. Danieli G.A. Rampazzo A. Identification of mutations in the cardiac ryanodine receptor gene in families affected with arrhythmogenic right ventricular cardiomyopathy type 2 (ARVD2).Hum Mol Genet. 2001; 10: 189-194Crossref PubMed Scopus (691) Google Scholar and TTN,10Taylor M. Graw S. Sinagra G. Barnes C. Slavov D. Brun F. Pinamonti B. Salcedo E.E. Sauer W. Pyxaras S. Anderson B. Simon B. Bogomolovas J. Labeit S. Granzier H. Mestroni L. Genetic variation in titin in arrhythmogenic right ventricular cardiomyopathy-overlap syndromes.Circulation. 2011; 124: 876-885Crossref PubMed Scopus (220) Google Scholar encompassing hundreds of exons, have also been implicated. Sanger sequencing of such large genes is not feasible in clinical diagnostics. Hence, next-generation sequencing (NGS) has a tremendous capacity for mutation screening in this disease.Table 1Characteristics of ARVC GenesGeneLocationCytobandName and function∗Obtained from http://www.genecards.org.Reference sequence†The transcripts selected as references agree with the references in the disease-targeted ARVC database, except for RYR2, which is not included there. However, both HaloPlex panels were designed to cover the coding sequences of all known transcripts of the genes included.No. of exonsDES2:220,283,098–220,291,4602q35Desmin is involved in connecting myofibrils to each other and to the plasma membrane.CCDS33383.1NM_001927.39DSC218:28,645,937–28,682,38718q12.1Desmocollin 2 is a component of intercellular desmosome junctions and is involved in the interaction of plaque proteins and intermediate filaments mediating cell-cell adhesion.CCDS11892.1NM_024422.316DSG218:29,078,026–29,128,81318q12.1Desmoglein 2 is a component of intercellular desmosome junctions and is involved in the interaction of plaque proteins and intermediate filaments mediating cell-cell adhesion.CCDS42423.1NM_001943.315DSP6:7,541,807–7,586,9456p24.3Desmoplakin is a major high-molecular-weight protein of desmosomes. It is involved in organization of the desmosomal cadherin-plakoglobin complexes into discrete plasma membrane domains and in anchoring of intermediate filaments to the desmosomes.CCDS4501.1NM_004415.224JUP17:39,910,858–39,942,96317q21.2Junction plakoglobin is present in desmosomes and in the intermediate junctions. It is suggested to play a central role in the structure and function of submembranous plaques.CCDS11407.1NM_021991.215PKP212:32,943,679–33,049,77912p11.21Plakophilin 2 is located in desmosomes and may play a role in junctional plaques.CCDS8731.1NM_004572.314RYR21:237,205,509–237,997,2871q43Ryanodine receptor 2 is a calcium channel that mediates the release of Ca2+ from the sarcoplasmic reticulum into the cytoplasm and triggers cardiac muscle contraction.CDS55691.1NM_001035.2105TGFB314:76,424,439–76,449,33314q24.3Transforming growth factor beta 3 is involved in embryogenesis and cell differentiation.CCDS9846.1NM_003239.27TMEM433:14,166,439–14,185,1793p25.1Transmembrane protein 43 may have an important role in maintaining nuclear envelope structure by organizing protein complexes at the inner nuclear membrane.CCDS2618.1NM_024334.212TTN2:179,390,715–179,672,1492q31.2Titin is a key component in the assembly. It contributes to the fine balance of forces between the two halves of the sarcomere.CCDS54424.1NM_133378.4311∗ Obtained from http://www.genecards.org.† The transcripts selected as references agree with the references in the disease-targeted ARVC database, except for RYR2, which is not included there. However, both HaloPlex panels were designed to cover the coding sequences of all known transcripts of the genes included. Open table in a new tab In the small clinical laboratory, a benchtop NGS sequencer can be of great value owing to fast sequencing turnover times and ease of use. Herein we used an Illumina MiSeq benchtop instrument (Illumina Inc., San Diego, CA) designed for paired-end sequencing by synthesis chemistry. With this instrument, PCR amplicons generated by conventional PCR can be sequenced after adapter and index incorporation. Sequencing libraries for disease-targeted gene panels can also be obtained using TruSeq Custom Amplicon (TSCA; Illumina Inc.), HaloPlex (Agilent Technologies Inc., Santa Clara, CA), or SureSelect (Agilent Technologies Inc.), for example. In the present work, SureSelect was not selected for library preparations because it has a more complex and expensive workflow compared with TSCA and HaloPlex. Comparing the other two design tools, HaloPlex was concluded to give better in silico coverage and was, therefore, selected for library preparations. This choice was supported by preliminary experiments presented below (Table 2; see also Results). The HaloPlex technique is based on restriction enzyme digestion of genomic DNA followed by hybridization of a multitude of HaloPlex probes to the digested DNA to extract fragments from the targeted gene regions. The probe/fragment circular hybrids formed are then ligated, purified, and amplified. TSCA, on the other hand, is based on multiplex PCR, which can more easily lead to allelic dropout if sequence variants are present in primer sites, whereas a HaloPlex design is usually characterized by the possibility of forming a multitude of nonduplicate amplicons covering the same area of interest, which is advantageous compared with the TSCA protocol. In this study, based on these considerations, we aimed to design and validate an NGS test panel for ARVC gene mutation screening using the HaloPlex target enrichment system. These results show equal sensitivity and specificity compared with Sanger sequencing. However, some of the alternative pipelines used for the bioinformatics analysis tended to conceal insertions and deletions and generate the occasional false single nucleotide variant. Twenty-seven unrelated patients with ARVC who had given written informed consent to participate in genetic screening were identified at the Department of Cardiology, County Council of Östergötland (Linköping, Sweden). Clinical history and medical records were obtained at enrollment, and the patients fulfilled the task force criteria from 2010.11Marcus F.I. McKenna W.J. Sherrill D. Basso C. Bauce B. Bluemke D.A. Calkins H. Corrado D. Cox M.G. Daubert J.P. Fontaine G. Gear K. Hauer R. Nava A. Picard M.H. Protonotarios N. Saffitz J.E. Sanborn D.M. Steinberg J.S. Tandri H. Thiene G. Towbin J.A. Tsatsopoulou A. Wichter T. Zareba W. Diagnosis of arrhythmogenic right ventricular cardiomyopathy/dysplasia: proposed modification of the task force criteria.Circulation. 2010; 121: 1533-1541Crossref PubMed Scopus (1430) Google Scholar DNA from 12 patients was analyzed using ARVC panel 1, and DNA from 15 patients was analyzed using ARVC panel 2. Before this, the exonic regions of DSG2, DSC2, DSP, JUP, and PKP2 of the 27 patients had been analyzed using Sanger sequencing. The study was approved by the Regional Ethics Board in Linköping (Linköping, Sweden) and was performed in accordance with the Helsinki Declaration. DNA was extracted from whole blood using either EZ1 (Qiagen GmbH, Hilden, Germany) or Prepito (Techtum Lab AB, Umeå, Sweden) kits. DNA concentration and quality were assessed using NanoDrop (Wilmington, DE) and Qubit (Life Technologies Inc., Gaithersburg, MD) fluorometers and DS BR DNA assay (Invitrogen, Carlsbad, CA). A260/A280 ratios of 1.8 to 2.0 and A260/A230 ratios >1.5 were accepted. DNA fragmentation was assessed using agarose gel (2%) electrophoresis. Coding exons and flanking intron sequences of the genes PKP2, DSP, DSG2, DSC2, and JUP were amplified from genomic DNA by PCR using primers tailed with universal M13 forward and reverse sequencing primers (Supplemental Table S1). Bidirectional Sanger sequencing was performed using the BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems, Carlsbad, CA) on a 3130xl genetic analyzer (Applied Biosystems). The sequencing results were analyzed against the National Center for Biotechnology Information (NCBI) Nucleotide database accession numbers NM_004572.3, NM_004415.2, NM_001943.3, NM_024422.3, and NM_021991.2 (http://www.ncbi.nlm.nih.gov/nucleotide) using SeqScape software version 2.6 (Applied Biosystems). Library preparation for NGS was accomplished using the HaloPlex PCR target enrichment system (Agilent Technologies Inc.). Using SureDesign (Agilent Technologies Inc.), probes were generated to cover the exons of the genes DES (CCDS33383.1), DSC2 (CCDS11892.1, CCDS11893.1), DSG2 (CCDS42423.1), DSP (CCDS47368.1, CCDS4501.1), JUP (CCDS11407.1), PKP2 (CCDS31771.1, CCDS8731.1), TGFB3 (CCDS9846.1), TMEM43 (CCDS2618.1), and TTN (NCBI Nucleotide database, http://www.ncbi.nlm.nih.gov/nucleotide; Accession numbers NM_133432, NM_133437, NM_003319, NM_133378, NM_133379). This design was called ARVC panel 1 and was used for library preparation from 12 samples. A second design, ARVC panel 2, was made where probes were generated to cover the exons and 25 bp of the surrounding intronic sequences of PKP2, DSP, DSG2, DSC2, JUP, DES, TGFB3, TMEM43, TTN, and RYR2, and the design was made toward all transcripts in the databases RefSeq (http://www.ncbi.nlm.nih.gov/refseq), Ensembl (http://www.ensembl.org), CCDS (http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi), Gencode (http://www.gencodegenes.org), and VEGA (http://vega.sanger.ac.uk/index.html). Using this design, libraries from 15 samples were generated. Both designs were made for Illumina 150-bp paired-end sequencing. The main reason for developing two different designs was to evaluate whether the addition of flanking bases was important for coverage. Also, in the second design, RYR2 was added because this gene may also be relevant to ARVC. After upgrading of the MiSeq instrument, this could be done without any major loss of overall coverage. Also note that the HaloPlex design tool used for panel 1 was based on CCDS data only, whereas a later version used for panel 2 allowed design toward all available transcripts. Amplicon libraries were prepared from genomic DNA of all the patients using the HaloPlex PCR target enrichment system according to the manufacturer's recommendations. In brief, 225 ng of DNA was used for restriction reactions, and hybridization was performed for 3 hours at 54°C. All the DNA samples were individually indexed. Amplification of the libraries was performed on an ABI 2720 thermocycler (Applied Biosystems). Restriction digestion and amplicon libraries were quality controlled using a high-sensitivity DNA kit and bioanalyzer (Agilent Technologies Inc.). All the sample libraries included fragments of the expected length. Libraries were quantified using a Qubit fluorometer, and DNA concentration was calculated using the formula 1 ng/μL = 3 nmol/L / (library average size in bp/500), where average fragment length was obtained from bioanalyzer data. Amplicon libraries were diluted to 2 nmol/L, and six to eight indexed samples were pooled at a final concentration of 6 pmol/L. Sequencing was performed using either MiSeq reagent kit version 1 or version 2, 300 cycles, on the MiSeq instrument (Illumina Inc.). The instrument was set to generate FASTQ files only, without adapter trimming. HaloPlex indices were added through editing the sample sheet for each run. The obtained cluster densities were 800,000 to 1,000,000 clusters/mm2. The raw data quality was checked using MiSeq Sequencing Analysis Viewer software version 1.8 (Illumina Inc.). The sequences obtained were quality controlled using FastQC software version v0.10.1 (Babraham Bioinformatics, Cambridge, UK). SAMtools flagstat was used to obtain metrics of the analyzed reads. To ensure accurate variant calling, the bioinformatics analysis was performed in three alternative ways (Figure 1): i) the HaloPlex-recommended pipeline (with some modifications), ii) SureCall software version 1.0.4.9 (Agilent Technologies Inc.), and iii) a custom script pipeline. Data were analyzed using a bioinformatics pipeline suggested by the HaloPlex development team (Agilent Technologies Inc.). This included processing of FASTQ files using the cutadapt tool version 1.1 for adapter removal. We had to remove an additional five bases at the 3′ end of each read to avoid false mutations caused by adapter remnants. Alignment to GRCh37/hg19 was performed using bwa-0.6.1-sampe (pipeline 1A) or bwa-0.7.4-mem (pipeline 1B).12Li H. Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform.Bioinformatics. 2010; 26: 589-595Crossref PubMed Scopus (7013) Google Scholar SAMtools-0.1.18 was used for conversion of .sam files to .bam files and for generation of .mpileup files.13Li H. Handsaker B. Wysoker A. Fennell T. Ruan J. Homer N. Marth G. Abecasis G. Durbin R. The Sequence Alignment/Map format and SAMtools.Bioinformatics. 2009; 25: 2078-2079Crossref PubMed Scopus (31547) Google Scholar From the .mpileup file, coverage was calculated using a combination of awk, custom scripts in Ruby-1.9.3, and perl scripts from the HaloPlex development team. Genome Analysis Toolkit version 2.2 (Broad Institute, Cambridge, MA)14DePristo M.A. Banks E. Poplin R. Garimella K.V. Maguire J.R. Hartl C. Philippakis A.A. del Angel G. Rivas M.A. Hanna M. McKenna A. Fennell T.J. Kernytsky A.M. Sivachenko A.Y. Cibulskis K. Gabriel S.B. Altshuler D. Daly M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat Genet. 2011; 43: 491-498Crossref PubMed Scopus (7098) Google Scholar was used for raw variant calling (UnifiedGenotyper), realignment (RealignerTargetCreator and IndelRealigner), recalibration (BaseRecalibrator and PrintReads), and variant calling (UnifiedGenotyper). For comparison of NGS and Sanger data, variants were called in exons and 10 bp flanking each exon because we wanted to be able to call variants also in the splice sites. Even if flanking regions had not been targeted in ARVC panel 1, this was possible because HaloPlex amplification, as a rule, includes a substantially larger region than the targeted exons, resulting in satisfactory coverage also in the flanking regions. Integrative Genomics Viewer was used for visualization of .bam and .vcf files. FASTQ files were analyzed using SureCall version 1.0.4.9 from unaligned and default settings for the two different designs. The adaptor removal algorithm in SureCall is not known to us. Burrows-Wheeler Aligner (BWA) bwa-0.6.1 was used as an aligner in this version of SureCall and SAMtools for variant calling. With the dual purpose to achieve a faster analysis procedure and also remove some low-quality reads observed in exon 1 of the DSP gene (Figure 3; see also Results), we designed an entirely different algorithm (pipeline 3, for which source code is also provided in Supplemental Code S1). The naive solution seemed to be to take advantage of the generally very high mapping quality of the reads and to use only perfect sequences for matching against the human genome. Sequences of poor quality could be expected to vary between themselves at any particular position, and nonsystematic erroneous reads would, thus, be naturally sorted out if only recurring reads (ie, 100% identical sequences) were taken into account. This pipeline was built from commands in Ruby scripts also using other freely available software and databases. First, the cleaned index-sorted paired-end reads were scanned for flanking HaloPlex adapter sequences, ie, 5′-AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3′ and 5′-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT-3′. However, the adapter 5′-recognition motif was restricted to 6 to 13 bp depending on the position of the adapter in the read. A perfect match was required in each case, which is simpler and faster compared with the procedure recommended by the HaloPlex development team. The minimal sequence for identification of an adapter at the 3′ end of a read was set to AGATCG. The adapter sequences were removed in the following way: i) five bases were removed from the 3′ end of all reads lacking identified adapter sequence (resulting in approximately 146-bp reads), ii) reads with adapter sequence within 50 bp of the 5′ end were discarded, and iii) reads with flanking adapter sequence in the 3′ end were trimmed by removal of the corresponding number of nucleotides. Next, after removal of reads with low-quality bases ≤Q13, the forward and reverse trimmed reads were pooled, sorted, and counted. Reads with few occurrences (ie, <3) were discarded. One read representative of identical sequences among the remaining reads was aligned to GRCh37/hg19 using the bwa-0.7.4-mem single-end strategy. Thus, instead of aligning thousands of identical reads, the alignment of only one perfect representative of each amplicon saves huge amounts of computer time. The resulting .sam file was expanded to restore copy numbers, and coverage was estimated by SAMtools-0.1.18 mpileup. The per base Phred quality score was calculated as a mean from the .mpileup file for each base position. Positions with coverage <20 and/or a mean Phred quality score <20 were regarded as “not covered” and were excluded from further analysis. A single nucleotide polymorphism/mutation was flagged if the ratio number of reads with a deviating base divided by total number of reads at a particular position was ≥0.25 for single nucleotide polymorphisms and ≥0.12 for insertions and deletions. The analysis described herein takes approximately 9 minutes in total for eight samples using a Mac laptop with a Quad-core i7 processor compared with hours in our setting using pipeline 1 or 2. This includes coverage and copy number analyses and also variant calling using Exome Variant Server on a local MySQL database. Thus, the total run time per sample will be approximately 2 minutes. A design was made to cover the exons and 10 bp flanking each side of the exons of the genes PKP2, DSP, DSG2, DSC2, JUP, DES, TGFB3, TMEM43, TTN, and RYR2 using the DesignStudio tool (Illumina Inc.). We included additional flanking bases on the exons where coverage was not achieved or was Q30. FastQC-generated graphs of the quality per base from all reads in a typical run are shown in Figure 2, demonstrating the quality scores along first and second reads. Supplemental Table S2 shows the number of reads generated for each sample, the number of QC-passed reads, and the fraction of aligned reads for bwa-0.6.1-aln plus sampe and bwa-0.7.4-mem. The mapping quality of the reads was visualized using SAMstat. The mean ± SD fraction of aligned reads with MAPQ values >30 was 95.7% ± 1.9% for all the samples. Very few sequencing errors were detected when examining .bam files using Integrative Genomics Viewer (Supplemental Figure S1). However, one especially problematic region was detected, located in exon 1 of the DSP gene (Figure 3). In this area, false sequence variation was detected in all the bioinformatics analysis pipelines. Nearly all reads in this region contained one or several bases deviating from the hg19 reference. These are shown as colored lines in the Integrative Genomics Viewer representation of reads in Figure 3. Also, many deletions (−) and insertions (І) were present. As a rule, the deviating bases and their neighbors were of low quality. The reads were also of variable length in this region, which is generally not observed in amplicon-based assays. All the erroneous reads were reverse reads. Misalignment of the reads could be excluded because neither the entire reads nor parts of the reads could be aligned to any location other than exon 1 of the DSP gene using GRCh37/hg19 as reference. The quality scores obtained after TSCA amplification were similar to the HaloPlex data. However, the background noise arising from apparently random erroneous base calls in the individual reads was higher, and these base calls could not be filtered away by stringent Phred score criteria (Supplemental Figure S1). Presumably, most of the erroneous base calls represent PCR nucleotide incorporation errors. In the first HaloPlex design, the overall coverage was estimated to be 99.7% by design (DES, 100.0%; DSC2, 100.0%; DSG2, 99.8%; DSP, 100.0%; JUP, 100.0%; PKP2, 100.0%; TGFB3, 100.0%; TMEM43, 100.0%; and TTN, 99.7%). The targeted region was 133,913 bp. Regions with coverage <20× were considered to be uncovered in this study. The ideal coverage threshold to use for NGS data are a matter of debate, but 20× has been adapted by others as a good cutoff point.15Besaratinia A. Li H. Yoon J.I. Zheng A. Gao H. Tommasi S. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens.Nucleic Acids Res. 2012; 40: e116Crossref PubMed Scopus (27) Google Scholar, 16Chin E.L. da Silva C. Hegde M. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.BMC Genet. 2013; 14: 6Crossref PubMed Scopus (65) Google Scholar, 17De Leeneer K. De Schrijver J. Clement L. Baetens M. Lefever S. De Keulenaer S. Van Criekinge W. Deforce D. Van Nieuwerburgh F. Bekaert S. Pattyn F. De Wilde B. Coucke P. Vandesompele J. Claes K. Hellemans J. Practical tools to implement massive parallel pyrosequencing of PCR products in next generation molecular diagnostics.PLoS One. 2011; 6: e25531Crossref PubMed Scopus (37) Google Scholar, 18Dohm J.C. Lottaz C. Borodina

Referência(s)