Artigo Acesso aberto Revisado por pares

Comparison of Clinical Targeted Next-Generation Sequence Data from Formalin-Fixed and Fresh-Frozen Tissue Specimens

2013; Elsevier BV; Volume: 15; Issue: 5 Linguagem: Inglês

10.1016/j.jmoldx.2013.05.004

ISSN

1943-7811

Autores

David H. Spencer, Jennifer K. Sehn, Haley Abel, Mark A. Watson, John D. Pfeifer, Eric J. Duncavage,

Tópico(s)

Molecular Biology Techniques and Applications

Resumo

Next-generation sequencing (NGS) has emerged as a powerful technique for the detection of genetic variants in the clinical laboratory. NGS can be performed using DNA from FFPE tissue, but it is unknown whether such specimens are truly equivalent to unfixed tissue for NGS applications. To address this question, we performed hybridization-capture enrichment and multiplexed Illumina NGS for 27 cancer-related genes using DNA from 16 paired fresh-frozen and routine FFPE lung adenocarcinoma specimens and conducted extensive comparisons between the sequence data from each sample type. This analysis revealed small but detectable differences between FFPE and frozen samples. Compared with frozen samples, NGS data from FFPE samples had smaller library insert sizes, greater coverage variability, and an increase in C to T transitions that was most pronounced at CpG dinucleotides, suggesting interplay between DNA methylation and formalin-induced changes; however, the error rate, library complexity, enrichment performance, and coverage statistics were not significantly different. Comparison of base calls between paired samples demonstrated concordances of >99.99%, with 96.8% agreement in the single-nucleotide variants detected and >98% accuracy of NGS data when compared with genotypes from an orthogonal single-nucleotide polymorphism array platform. This study demonstrates that routine processing of FFPE samples has a detectable but negligible effect on NGS data and that these samples can be a reliable substrate for clinical NGS testing. Next-generation sequencing (NGS) has emerged as a powerful technique for the detection of genetic variants in the clinical laboratory. NGS can be performed using DNA from FFPE tissue, but it is unknown whether such specimens are truly equivalent to unfixed tissue for NGS applications. To address this question, we performed hybridization-capture enrichment and multiplexed Illumina NGS for 27 cancer-related genes using DNA from 16 paired fresh-frozen and routine FFPE lung adenocarcinoma specimens and conducted extensive comparisons between the sequence data from each sample type. This analysis revealed small but detectable differences between FFPE and frozen samples. Compared with frozen samples, NGS data from FFPE samples had smaller library insert sizes, greater coverage variability, and an increase in C to T transitions that was most pronounced at CpG dinucleotides, suggesting interplay between DNA methylation and formalin-induced changes; however, the error rate, library complexity, enrichment performance, and coverage statistics were not significantly different. Comparison of base calls between paired samples demonstrated concordances of >99.99%, with 96.8% agreement in the single-nucleotide variants detected and >98% accuracy of NGS data when compared with genotypes from an orthogonal single-nucleotide polymorphism array platform. This study demonstrates that routine processing of FFPE samples has a detectable but negligible effect on NGS data and that these samples can be a reliable substrate for clinical NGS testing. Next-generation sequencing (NGS) has recently emerged as a cost-effective method for identifying clinically actionable genetic variants across many genes in a single test. This approach has now been successfully applied to detect somatic mutations in hematopoietic malignant tumors, solid tumors, and constitutional mutations in genes associated with inherited cancer predisposition syndromes, among other clinical testing applications.1Duncavage E.J. Abel H.J. Szankasi P. Kelley T.W. Pfeifer J.D. Targeted next generation sequencing of clinically significant gene mutations and translocations in leukemia.Mod Pathol. 2012; 25: 795-804Crossref PubMed Scopus (66) Google Scholar, 2Ellis M.J. Ding L. Shen D. Luo J. Suman V.J. Wallis J.W. et al.Whole-genome analysis informs breast cancer response to aromatase inhibition.Nature. 2012; 486: 353-360Crossref PubMed Scopus (821) Google Scholar, 3Ley T.J. Ding L. Walter M.J. McLellan M.D. Lamprecht T. Larson D.E. et al.DNMT3A mutations in acute myeloid leukemia.N Engl J Med. 2010; 363: 2424-2433Crossref PubMed Scopus (1523) Google Scholar, 4Pritchard C.C. Smith C. Salipante S.J. Lee M.K. Thornton A.M. Nord A.S. Gulden C. Kupfer S.S. Swisher E.M. Bennett R.L. Novetsky A.P. Jarvik G.P. Olopade O.I. Goodfellow P.J. King M.C. Tait J.F. Walsh T. ColoSeq provides comprehensive lynch and polyposis syndrome mutational analysis using massively parallel sequencing.J Mol Diagn. 2012; 14: 357-366Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar, 5Walter M.J. Shen D. Ding L. Shao J. Koboldt D.C. Chen K. Larson D.E. McLellan M.D. Dooling D. Abbott R. Fulton R. Magrini V. Schmidt H. Kalicki-Veizer J. O'Laughlin M. Fan X. Grillot M. Witowski S. Heath S. Frater J.L. Eades W. Tomasson M. Westervelt P. DiPersio J.F. Link D.C. Mardis E.R. Ley T.J. Wilson R.K. Graubert T.A. Clonal architecture of secondary acute myeloid leukemia.N Engl J Med. 2012; 366: 1090-1098Crossref PubMed Scopus (612) Google Scholar Further, numerous studies have demonstrated that NGS techniques are able to detect the full range of DNA variation, including single-nucleotide variants (SNVs), insertions/deletions, translocations, and copy number changes,4Pritchard C.C. Smith C. Salipante S.J. Lee M.K. Thornton A.M. Nord A.S. Gulden C. Kupfer S.S. Swisher E.M. Bennett R.L. Novetsky A.P. Jarvik G.P. Olopade O.I. Goodfellow P.J. King M.C. Tait J.F. Walsh T. ColoSeq provides comprehensive lynch and polyposis syndrome mutational analysis using massively parallel sequencing.J Mol Diagn. 2012; 14: 357-366Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar, 6Adams M.D. Veigl M.L. Wang Z. Molyneux N. Sun S. Guda K. Yu X. Markowitz S.D. Willis J. Global mutational profiling of formalin-fixed human colon cancers from a pathology archive.Mod Pathol. 2012; 25: 1599-1608Crossref PubMed Scopus (20) Google Scholar, 7Eberle F.C. Hanson J.C. Killian J.K. Wei L. Ylaya K. Hewitt S.M. Jaffe E.S. Emmert-Buck M.R. Rodriguez-Canales J. Immunoguided laser assisted microdissection techniques for DNA methylation analysis of archival tissue specimens.J Mol Diagn. 2010; 12: 394-401Abstract Full Text Full Text PDF PubMed Scopus (32) Google Scholar, 8Thirlwell C. Eymard M. Feber A. Teschendorff A. Pearce K. Lechner M. Widschwendter M. Beck S. Genome-wide DNA methylation analysis of archival formalin-fixed paraffin-embedded tissue using the Illumina Infinium HumanMethylation27 BeadChip.Methods. 2010; 52: 248-254Crossref PubMed Scopus (86) Google Scholar, 9Wang J. Mullighan C.G. Easton J. Roberts S. Heatley S.L. Ma J. Rusch M.C. Chen K. Harris C.C. Ding L. Holmfeldt L. Payne-Turner D. Fan X. Wei L. Zhao D. Obenauer J.C. Naeve C. Mardis E.R. Wilson R.K. Downing J.R. Zhang J. CREST maps somatic structural variation in cancer genomes with base-pair resolution.Nat Methods. 2011; 8: 652-654Crossref PubMed Scopus (378) Google Scholar, 10Nord A.S. Lee M. King M.C. Walsh T. Accurate and exact CNV identification from targeted high-throughput sequence data.BMC Genomics. 2011; 12: 184Crossref PubMed Scopus (155) Google Scholar, 11Carter S.L. Cibulskis K. Helman E. McKenna A. Shen H. Zack T. Laird P.W. Onofrio R.C. Winckler W. Weir B.A. Beroukhim R. Pellman D. Levine D.A. Lander E.S. Meyerson M. Getz G. Absolute quantification of somatic DNA alterations in human cancer.Nature Biotechnol. 2012; 30: 413-421Crossref PubMed Scopus (1259) Google Scholar in the research setting. Thus, clinical NGS-based diagnostics offer an improvement over current molecular methods, such as PCR and Sanger sequencing, by which only a limited spectrum of mutations can be identified at a single genomic locus. Implementing NGS methods for routine clinical testing requires validation across the range of variables encountered in a clinical diagnostics laboratory, including the various specimen types that may be submitted for testing. The preferred specimen for most molecular tests is fresh tissue (eg, blood anti-coagulated with EDTA or fresh tissue from a surgical biopsy or excision in saline) because this minimizes processing and exposures that could compromise the integrity of the DNA in the sample. However, in most clinical molecular pathology laboratories, fresh or fresh-frozen tissue specimens are rare owing to the logistical complexities of collecting and storing such samples. Instead, most specimens are formalin-fixed, paraffin embedded (FFPE) tissue blocks from the surgical pathology laboratory. These FFPE specimens have some advantages, such as easier storage, but it is well known that formalin fixation results in DNA damage. Formaldehyde reacts with DNA and proteins to form labile hydroxymethyl intermediates, resulting in DNA-DNA, DNA-RNA, and DNA-protein molecules that are covalently linked by methylene bridges. Formaldehyde also engenders oxidation and deamination reactions and the formation of cyclic base derivatives.12Auerbach C. Moutschen-Dahmen M. Moutschen J. Genetic and cytogenetical effects of formaldehyde and related compounds.Mutat Res. 1977; 39: 317-361Crossref PubMed Scopus (228) Google Scholar, 13Bresters D. Schipper M.E. Reesink H.W. Boeser-Nunnink B.D. Cuypers H.T. The duration of fixation influences the yield of HCV cDNA-PCR products from formalin-fixed, paraffin-embedded liver tissue.J Virol Methods. 1994; 48: 267-272Crossref PubMed Scopus (69) Google Scholar, 14Farragher S.M. Tanney A. Kennedy R.D. Harkin D.P. RNA expression analysis from formalin fixed paraffin embedded tissues.Histochem Cell Biol. 2008; 130: 435-445Crossref PubMed Scopus (157) Google Scholar, 15Feldman M.Y. Reactions of nucleic acids and nucleoproteins with formaldehyde.Prog Nucleic Acid Res Mol Biol. 1973; 13: 1-49Crossref PubMed Scopus (296) Google Scholar, 16Inadome Y. Noguchi M. Selection of higher molecular weight genomic DNA for molecular diagnosis from formalin-fixed material.Diagn Mol Pathol. 2003; 12: 231-236Crossref PubMed Scopus (21) Google Scholar, 17Karlsen F. Kalantari M. Chitemerere M. Johansson B. Hagmar B. Modifications of human and viral deoxyribonucleic acid by formaldehyde fixation.Lab Invest. 1994; 71: 604-611PubMed Google Scholar These chemical modifications have the potential to confound molecular testing through inhibition of enzymatic manipulation of DNA or direct causation of single-base changes and other sequence aberrations. In addition, the methylene crosslinks lead to DNA fragmentation that can make analysis of sequences longer than 100 to 200 bp problematic. Although several studies have established that NGS can be performed using DNA from FFPE tissue, it remains unknown whether these specimens are equivalent to fresh tissue for testing in a clinical environment.6Adams M.D. Veigl M.L. Wang Z. Molyneux N. Sun S. Guda K. Yu X. Markowitz S.D. Willis J. Global mutational profiling of formalin-fixed human colon cancers from a pathology archive.Mod Pathol. 2012; 25: 1599-1608Crossref PubMed Scopus (20) Google Scholar, 18Forshew T. Murtaza M. Parkinson C. Gale D. Tsui D.W. Kaper F. Dawson S.J. Piskorz A.M. Jimenez-Linan M. Bentley D. Hadfield J. May A.P. Caldas C. Brenton J.D. Rosenfeld N. Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA.Sci Transl Med. 2012; 4 (136ra68)Crossref PubMed Scopus (922) Google Scholar, 19Loudig O. Brandwein-Gensler M. Kim R.S. Lin J. Isayeva T. Liu C. Segall J.E. Kenny P.A. Prystowsky M.B. Illumina whole-genome complementary DNA-mediated annealing, selection, extension and ligation platform: assessing its performance in formalin-fixed, paraffin-embedded samples and identifying invasion pattern-related genes in oral squamous cell carcinoma.Hum Pathol. 2011; 42: 1911-1922Abstract Full Text Full Text PDF PubMed Scopus (24) Google Scholar, 20Duncavage E.J. Magrini V. Becker N. Armstrong J.R. Demeter R.T. Wylie T. Abel H.J. Pfeifer J.D. Hybrid capture and next-generation sequencing identify viral integration sites from formalin-fixed, paraffin-embedded tissue.J Mol Diagn. 2011; 13: 325-333Abstract Full Text Full Text PDF PubMed Scopus (85) Google Scholar, 21Kerick M. Isau M. Timmermann B. Sultmann H. Herwig R. Krobitsch S. Schaefer G. Verdorfer I. Bartsch G. Klocker H. Lehrach H. Schweiger M.R. Targeted high throughput sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded (FFPE) tumor tissues, input amount and tumor heterogeneity.BMC Med Genomics. 2011; 4: 68Crossref PubMed Scopus (150) Google Scholar Prior work has suggested that deamination and other damage caused by fixation could account for as many as 1% of the SNV calls in low-coverage sequence data (eg, 20-fold coverage) and may skew the transition to transversion ratio in sequence data from FFPE tissue samples.19Loudig O. Brandwein-Gensler M. Kim R.S. Lin J. Isayeva T. Liu C. Segall J.E. Kenny P.A. Prystowsky M.B. Illumina whole-genome complementary DNA-mediated annealing, selection, extension and ligation platform: assessing its performance in formalin-fixed, paraffin-embedded samples and identifying invasion pattern-related genes in oral squamous cell carcinoma.Hum Pathol. 2011; 42: 1911-1922Abstract Full Text Full Text PDF PubMed Scopus (24) Google Scholar, 21Kerick M. Isau M. Timmermann B. Sultmann H. Herwig R. Krobitsch S. Schaefer G. Verdorfer I. Bartsch G. Klocker H. Lehrach H. Schweiger M.R. Targeted high throughput sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded (FFPE) tumor tissues, input amount and tumor heterogeneity.BMC Med Genomics. 2011; 4: 68Crossref PubMed Scopus (150) Google Scholar Although these studies have limitations, such as the use of cell lines and unrelated specimens as controls rather than paired fresh tissue, they raise the possibility that damage caused by fixation could result in erroneous clinical test results. In this study we conducted an in-depth comparison of NGS sequence data generated from paired FFPE and fresh-frozen tissue to examine the potential effects of routine tissue processing and storage on NGS-based testing. We used targeted, solution-phase capture enrichment and paired-end sequencing (Illumina Inc, San Diego, CA) to sequence 27 genes from 16 routine FFPE tissue blocks and 16 paired fresh-frozen tissue samples derived from clinical lung adenocarcinoma specimens. The paired study design was used to minimize intrinsic tumor-specific discrepancies in mutation type and frequency that would confound comparisons of sequence data derived from disparate tissue or tumor samples. In addition, we sequenced specimens that had been subjected to an extended incubation in saline (increased ischemic time) or fixed for a range of intervals to explore the effects that these preanalytic processing variables may have on NGS data. For each set of paired samples, our analysis focused on all aspects of the analytical process, including quality measures of the DNA prepared from the specimens, the raw sequencing results, sequence data quality, read alignments, library complexity, raw error rate, and consensus base calls. In addition, for a subset of cases we also evaluated sequence agreement with orthogonal array-based genotypes. Our results reveal that DNA damage due to tissue fixation is evident in NGS data, but differences between fixed and frozen samples are minor and do not affect clinical diagnostic calls. Further, we demonstrate that low-quality DNA samples generated through increased time to fixation (ischemic time) or excessive formalin fixation can still yield reliable NGS results. Fresh snap-frozen samples of 16 surgically resected lung adenocarcinomas were randomly selected from the Siteman Cancer Center Tissue Repository (St. Louis, MO), and a routinely processed (10% w/w buffered formalin and overnight fixation) FFPE block from each case was obtained from the Lauren V. Ackerman Laboratory of Surgical Pathology at Barnes-Jewish Hospital, St. Louis, Missouri (Figure 1A).22McDonald S.A. Mardis E.R. Ota D. Watson M.A. Pfeifer J.D. Green J.M. Comprehensive genomic studies: emerging regulatory, strategic, and quality assurance challenges for biorepositories.Am J Clin Pathol. 2012; 138: 31-41Crossref PubMed Scopus (9) Google Scholar All slides (frozen sections and permanent sections) were reviewed for adequacy and tumor cellularity by an anatomical pathologist (E.D.); tumor cellularity is reported in Supplemental Table S1 and Supplemental Figure S1. A paired t-test revealed no significant difference between FFPE and frozen tissue cellularity. A second set of experiments was used to examine the effects of ischemic time (ie, time to fixation) and increased fixation time to simulate the full range of preanalytic variability encountered in clinical samples (Figure 1B). Remnant tissue from a randomly selected uterine adenomyosis specimen was used in these experiments. Specifically, portions were immediately snap frozen, subjected to 24 or 48 hours of ischemia (ie, held in saline at room temperature for 24 or 48 hours) before routine processing (which included overnight formalin-fixation), or subjected to 24-, 48-, or 72-hour formalin fixation before routine processing. For frozen tissue, genomic DNA was extracted from approximately 10 to 20 10-μm cryostat sections of OCT embedded frozen tissue using the QIAmp Micro DNA kit (Qiagen, Valencia, CA) per the manufacturer's instructions. For FFPE tissue, two to three 1-mm-diameter punches were taken from the paraffin block and deparaffinized by two 10-minute incubations with xylene at room temperature. The deparaffinized tissue was washed with 96% to 100% ethanol and heated to 37°C for 15 minutes to remove excess ethanol. DNA was then extracted from the tissue cores after overnight incubation in buffer ATL with proteinase K at 56°C, using the QIAmp Micro DNA kit per the manufacturer's instructions. For the frozen samples, extraction of high-molecular-weight (>1000 bp in length), nondegraded DNA was confirmed by electrophoresis using a 0.8% agarose gel. For all FFPE samples, DNA fragment length was assessed using a multiplex PCR ladder assay for the GAPDH gene, with amplicon lengths of 105, 239, 299, and 411 bp23van Dongen J.J. Langerak A.W. Bruggemann M. Evans P.A. Hummel M. Lavender F.L. Delabesse E. Davi F. Schuuring E. Garcia-Sanz R. van Krieken J.H. Droese J. Gonzalez D. Bastard C. White H.E. Spaargaren M. Gonzalez M. Parreira A. Smith J.L. Morgan G.J. Kneba M. Macintyre E.A. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: report of the BIOMED-2 Concerted Action BMH4-CT98-3936.Leukemia. 2003; 17: 2257-2317Crossref PubMed Scopus (2497) Google Scholar; samples with amplicons of at least 299 bp were deemed high quality, and samples with only 105-bp amplicons were classified as poor quality. All samples were capture enriched for a comprehensive cancer set, WU-CaMP27 (Genomics and Pathology Services at Washington University, St. Louis, MO), that includes 27 genes commonly mutated in cancer (Supplemental Table S2). One microgram of extracted DNA was fragmented to 200 to 250 bp using a Covaris E210 instrument (Covaris Inc, Woburn, MA). Fragmentation was verified on an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA), and the fragmented DNA was purified with Agencourt AmpureXP beads (Beckman Coulter, Danvers, MA), end-repaired and A-tailed with Klenow DNA polymerase, and finally ligated to universal Illumina adapters. Library fragments were then bead purified and analyzed for adequate ligation on an Agilent 2100 bioanalyzer (Agilent, Santa Clara, CA). Limited-cycle PCR with sample-specific, index-tagged primers was then performed to enrich for ligation products with the appropriate configuration (ie, ligation of one of each of the adapters on either end). Whole-genome libraries were enriched for exons, 200 bp of flanking intronic sequence, and 1-kbp flanking the first and last exon of the genes targeted by the WU-CaMP27 set using a custom Agilent SureSelect biotinylated cRNA probe set. SureSelect reagents were prepared according to the manufacturer instructions, and 500 ng of each indexed library was hybridized at 65°C for 24 hours. Captured library fragments were washed and purified from unbound material using MyOne T1 streptavidin beads (Life Technologies Corp., Grand Island, NY) and then resuspended and bead purified before a final limited-cycle PCR amplification. Verification of library size and quantity was performed by electrophoresis using an Agilent Bioanalyzer. Enriched libraries were pooled (30 indexed libraries per pool) and sequenced in multiplex on an Illumina HiSeq 2000 instrument (Illumina Inc) using version 3 chemistry following established protocols for paired-end 101-bp reads. Base calls and quality scores were produced by the included Illumina (San Diego, CA) analytical software version 1.7 (Casava). The resulting FASTQ files were aligned to National Center for Biotechnology Information build 37.2 of the human reference genome (hg19) using Novoalign (Novocraft, Selangor, Malaysia) with default paired-end parameters. Mapped reads were marked for duplicates with Picard tools (http://picard.sourceforge.net, last accessed December 17, 2012) before subsequent analysis. Quality metrics, including error rate, unaligned bases, mapping results, and coverage statistics, were obtained using BedTools and SAMTools,24Li H. Handsaker B. Wysoker A. Fennell T. Ruan J. Homer N. Marth G. Abecasis G. Durbin R. The Sequence Alignment/Map format and SAMtools.Bioinformatics. 2009; 25: 2078-2079Crossref PubMed Scopus (31559) Google Scholar, 25Quinlan A.R. Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features.Bioinformatics. 2010; 26: 841-842Crossref PubMed Scopus (11696) Google Scholar followed by custom scripts in Perl and R to parse the output of these programs (scripts available from corresponding author on request). All analysis involved only high-quality reads with mapping quality >20, except where explicitly stated. Variant identification was performed on data from paired samples jointly using the UnifiedGenotyper in the Genome Analysis Toolkit (GATK; version 2.1)26DePristo M.A. Banks E. Poplin R. Garimella K.V. Maguire J.R. Hartl C. Philippakis A.A. del Angel G. Rivas M.A. Hanna M. McKenna A. Fennell T.J. Kernytsky A.M. Sivachenko A.Y. Cibulskis K. Gabriel S.B. Altshuler D. Daly M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat Genet. 2011; 43: 491-498Crossref PubMed Scopus (7101) Google Scholar and reported as variant calling format files with parameters described in the best practices methods for the GATK package, including removal of duplicate reads, quality score recalibration, and insertion/deletion realignment. Discrepancies used for the calculation of transition and transversion base changes were determined using a custom script that parsed the output of the SAMTools mpileup utility, which outputs the number, quality, and identity of all bases corresponding to a particular position. All plots and statistical analyses were performed using the R statistics package version 2.15.1 (R Project for Statistical Computing, http://www.r-project.org). DNA from the corresponding high-quality frozen tissue in each paired sample was analyzed by Affymetrix single-nucleotide polymorphism (SNP) 6 array according to the manufacturer's instructions (Affymetrix, Santa Clara, CA). Data were analyzed in large batches using the Affymetrix Expression Console software version 1.3 and exported as common birdseed format files containing SNP and SNV calls. The SNVs called by array were then compared with the NGS-generated variant calling format files using custom scripts (available on request). This study was approved by the Human Studies Committee of Washington University School of Medicine (Institutional Review Board approval No. 201101733). To compare NGS data from FFPE and frozen samples, we selected 16 clinical lung adenocarcinoma specimens from the Siteman Cancer Center Tissue Repository for NGS analysis via a 27-gene oncology set. All these samples originated from routine tissue processing of surgical specimens for diagnostic surgical pathology with paired snap-frozen tissue obtained from the surgical specimen, and the selection of specific cases for this study was arbitrary. The mean age of blocks at the time of sequencing was 8.1 years (range, 7 to 12 years). After extraction, all DNA samples passed standard purity and fragmentation tests before library generation, enrichment, and sequencing; FFPE-derived DNA was amplifiable to at least 200 bp using a multiplexed size control ladder PCR assay. The total amount of extracted DNA was >1 μg for both frozen and FFPE cases as determined by Qubit quantification, and the same amount of DNA was used as input for library construction for each of the paired samples. All 32 indexed libraries were then subjected to equimolar pooling and multiplex sequencing across three lanes of an Illumina HiSeq 2000 instrument. To determine whether identical amounts of DNA from FFPE and frozen samples yield similar sequence data, we compared a variety of metrics, including the number of reads generated, mapping results, and coverage depth between the two sample types (Table 1). Multiplex sequencing resulted in a mean of 17.9 million reads (range, 5.3 million to 26 million) and 26.8 million reads (range, 4.4 million to 57 million) for frozen and FFPE samples, respectively. Mapping of reads to the human reference genome revealed that sequence data from FFPE samples produced slightly lower proportions of mapped reads (frozen: 99.3%, FFPE: 98.9%; P = 0.001) and reads that mapped in the proper genomic configuration (ie, forward and reverse reads to opposite strands) (frozen: 98.6%, FFPE: 97.8%; P = 0.02), although the proportion of unique reads (defined as read pairs with identical start and end mapping coordinates) (frozen: 58.2%, FFPE: 49.1%; P = 0.08) and reads mapping to the capture target (frozen: 56.0%, FFPE: 58.0%; P = 0.5) were similar between the sample types. This finding suggests that the NGS libraries generated from FFPE and frozen samples were similar in complexity and resulted in comparable amounts of useable sequence data.Table 1Comparison of Sequencing Results and Quality Statistics for FFPE SamplesReadsFrozenFFPEUnadjusted P valueMean (95% CI)RangeMean (95% CI)RangeTotal reads (in millions)17.9 (14.9–20.9)5.4–26.326.8 (18.7–34.9)4.4–57.70.04Mapped reads (%)99.3 (99.1–99.6)98.2–99.698.9 (98.7–99.1)97.9–99.20.001Mapped on-target reads (%)56.3 (52.5–60.1)45.9–70.157.6 (52.9–62.3)48.8–75.20.6Properly mapped reads (%)98.6 (98–99.2)95.4–99.297.7 (97.2–98.3)94.8–98.50.03Unique reads (%)58.2 (51.3–65.1)40.9–76.249.1 (40.8–57.4)15.1–62.10.08Discrepancies∗Discrepancies are defined as any substitution, insertion, or deletion in the aligned portion of a read compared with the reference sequence and are an estimation of the error rate because errors are substantially more frequent than true differences between individual genome sequences. per read0.27 (0.25−0.29)0.17−0.310.29 (0.27−0.31)0.17–0.30.1Unaligned bases†Unaligned bases are positions at the ends of reads that do not match the reference sequence and so are trimmed from the read alignments. per read2.9 (2.2–3.5)2.1–6.54.6 (3.3–5.9)2.7–10.40.02No. of bases <Q20‡Phred-scaled quality of 20. per read3.2 (2.6–3.8)2.3–6.53.7 (2.8–4.6)2.4–7.30.4Mean insert size (bp)222 (210–234)182–249177 (173–181)163–18710−7∗ Discrepancies are defined as any substitution, insertion, or deletion in the aligned portion of a read compared with the reference sequence and are an estimation of the error rate because errors are substantially more frequent than true differences between individual genome sequences.† Unaligned bases are positions at the ends of reads that do not match the reference sequence and so are trimmed from the read alignments.‡ Phred-scaled quality of 20. Open table in a new tab Because formalin fixation is known to result in DNA damage and fragmentation, we next examined the extent of these effects by comparing sequence and read quality metrics between paired frozen and FFPE samples. We compared the distribution of library insert sizes determined from the distance between properly mapped forward and reverse read pairs and found a significant difference between the sample types consistent with formalin-induced fragmentation. FFPE samples generated shorter sequencing library inserts than frozen samples (frozen: 222 bp, FFPE: 177 bp; P < 10−7, Student's t-test, Table 1), and the distribution of FFPE library fragment sizes was narrower compared with fresh-frozen samples; read lengths were the same between both sample types (Figure 2). We next considered the frequency of discrepancies (defined as any substitution, insertion, or deletion in aligned portions of reads) and unaligned bases (ie, positions at the ends of reads that do not match the reference) in individual reads that were successfully mapped to the reference sequence because these discrepancies may be increased in FFPE samples in the context of formalin-induced DNA damage. This analysis revealed that data from FFPE samples had a modest increase in the number of unaligned bases, but the frequency of discrepancies was not significantly different (Table 1). These findings imply that DNA damage caused by formalin fixation did not result in a substantial increase in sequence discrepancies that was detectable above the background sequencing error rate of our sequencing platform. To determine whether the subtle differences we observed in sequencing results and read metrics affected the ability to obtain adequate sequence for target regions of interest, we examined the coverage dep

Referência(s)