Artigo Acesso aberto Revisado por pares

A GC-Wave Correction Algorithm that Improves the Analytical Performance of aCGH

2012; Elsevier BV; Volume: 14; Issue: 6 Linguagem: Inglês

10.1016/j.jmoldx.2012.06.002

ISSN

1943-7811

Autores

Angela Leo, A. M. Walker, Matthew S. Lebo, Brant C. Hendrickson, Thomas Scholl, Viatcheslav R. Akmaev,

Tópico(s)

Prenatal Screening and Diagnostics

Resumo

Array-based comparative genome hybridization (aCGH) is a powerful, data-intensive technique used to identify genomic copy number variation throughout the human genome. The use of aCGH clinically to identify pathogenic copy number aberrations is becoming common, and the statistical and mathematical algorithms used in aCGH data analysis play an important role in determining the performance of these platforms. Interpretation of aCGH data can be complicated by a platform-independent technical artifact described as GC-waves, which are wave patterns in CGH data correlating to regional GC-content of the human genome that can reduce the clinical specificity and sensitivity of aCGH platforms. We describe an automated GC-wave correction algorithm and techniques to understand how the correction affects the analytical performance of aCGH. This GC-correction algorithm was effective at mitigating GC-wave effects. After correction, array data were measurably improved by the algorithm, demonstrating improvements in specificity and sensitivity and in overall data quality. Array-based comparative genome hybridization (aCGH) is a powerful, data-intensive technique used to identify genomic copy number variation throughout the human genome. The use of aCGH clinically to identify pathogenic copy number aberrations is becoming common, and the statistical and mathematical algorithms used in aCGH data analysis play an important role in determining the performance of these platforms. Interpretation of aCGH data can be complicated by a platform-independent technical artifact described as GC-waves, which are wave patterns in CGH data correlating to regional GC-content of the human genome that can reduce the clinical specificity and sensitivity of aCGH platforms. We describe an automated GC-wave correction algorithm and techniques to understand how the correction affects the analytical performance of aCGH. This GC-correction algorithm was effective at mitigating GC-wave effects. After correction, array data were measurably improved by the algorithm, demonstrating improvements in specificity and sensitivity and in overall data quality. Genomic copy number alterations are the causal lesions for many known genetic diseases and syndromes. The application of array-based comparative genome hybridization (aCGH) as a high-resolution method for detecting genomic copy number changes has significantly advanced the understanding of the human genome, disease, and genetic variation.1Baldwin E.L. Lee J.Y. Blake D.M. Bunke B.P. Alexander C.R. Kogan A.L. Ledbetter D.H. Martin C.L. Enhanced detection of clinically relevant genomic imbalances using a targeted plus whole genome oligonucleotide microarray.Genet Med. 2008; 10: 415-429Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar High-density CGH arrays, which can contain millions of probes selectively spaced throughout the genome, offer increased resolution and sensitivity over cytogenetic methods, such as fluorescence in situ hybridization (FISH) and G-banded karyotyping. These proven technologies are known for their reliability in detecting clinically relevant genomic imbalances; however, they have limitations. Current G-banded karyotyping protocols are limited by a detection resolution of 3 to 5 Mb. FISH can assess DNA copy number only in specific targeted loci and also has resolution limited to approximately 50 kb, depending on many factors, including genomic location.2Shevell M. Ashwal S. Donley D. Flint J. Gingold M. Hirtz D. Majnemer A. Noetzel M. Sheth R.D. Practice parameter: evaluation of the child with global developmental delay: report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society.Neurology. 2003; 60: 367-380Crossref PubMed Scopus (599) Google Scholar, 3Shaffer L. Ledbetter D. Lupski J. Molecular cytogenetics of contiguous gene syndromes: mechanisms and consequences of gene dosage imbalance.in: Scriver C.R. Beaudet A.L. Sly W.S. Valle D. Childs B. Kinzler K.W. Vogelstein B. Metabolic and Molecular Basis of Inherited Disease. McGraw Hill, New York2001: 1291-1324Google Scholar aCGH combines the genome-wide detection capabilities of G-banded karyotyping with higher resolution than is possible with FISH. Large studies of patient and control populations using aCGH have led to the discovery of many disease-causing genes and genomic regions.4Perry G.H. Ben-Dor A. Tsalenko A. Sampas N. Rodriguez-Revenga L. Tran C.W. Scheffer A. Steinfeld I. Tsang P. Yamada N.A. Park H.S. Kim J.I. Seo J.S. Yakhini Z. Laderman S. Bruhn L. Lee C. The fine-scale and complex architecture of human copy-number variation.Am J Hum Genet. 2008; 82: 685-695Abstract Full Text Full Text PDF PubMed Scopus (291) Google Scholar, 5Slavotinek A.M. Novel microdeletion syndromes detected by chromosome microarrays.Hum Genet. 2008; 124: 1-17Crossref PubMed Scopus (173) Google Scholar In several studies, aCGH demonstrated increased clinical utility over cytogenetic technologies for patients with mental retardation or developmental delays by finding structural aberrations in samples where karyotyping and FISH had previously yielded negative results.1Baldwin E.L. Lee J.Y. Blake D.M. Bunke B.P. Alexander C.R. Kogan A.L. Ledbetter D.H. Martin C.L. Enhanced detection of clinically relevant genomic imbalances using a targeted plus whole genome oligonucleotide microarray.Genet Med. 2008; 10: 415-429Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar, 6Shaffer L.G. Kashork C.D. Saleki R. Rorem E. Sundin K. Ballif B.C. Bejjani B.A. Targeted genomic microarray analysis for identification of chromosome abnormalities is 1500 consecutive clinical cases.J Pediatr. 2006; 149: 98-102Abstract Full Text Full Text PDF PubMed Scopus (184) Google Scholar, 7De Gregori M. Ciccone R. Magini P. Pramparo T. Gimelli S. Messa J. et al.Cryptic deletions are a common finding in "balanced" reciprocal and complex chromosome rearrangements: a study of 59 patients.J Med Genet. 2007; 44: 750-762Crossref PubMed Scopus (228) Google Scholar, 8Christian S.L. Brune C.W. Sudi J. Kumar R.A. Liu S. Karamohamed S. Badner J.A. Matsui S. Conroy J. McQuaid D. Gergel J. Hatchwell E. Gilliam T.C. Gershon E.S. Nowak N.J. Dobyns W.B. Cook Jr, E.H. Novel submicroscopic abnormalities detected in autism spectrum disorder.Biol Psychiatry. 2008; 63: 1111-1117Abstract Full Text Full Text PDF PubMed Scopus (233) Google Scholar, 9Baptista J. Mercer C. Prigmore E. Gribble S.M. Carter N.P. Maloney V. Thomas N.S. Jacobs P.A. Crolla J.A. Breakpoint mapping and array CGH in translocations: comparison of a phenotypically normal and an abnormal cohort.Am J Hum Genet. 2008; 82: 927-936Abstract Full Text Full Text PDF PubMed Scopus (134) Google Scholar, 10Van den Veyver I.B. Patel A. Shaw C.A. Pursley A.N. Kang S.H. Simovich M.J. Ward P.A. Darilek S. Johnson A. Neill S.E. Bi W. White L.D. Eng C.M. Lupski J.R. Cheung S.W. Beaudet A.L. Clinical use of array comparative genome hybridization (aCGH) for prenatal diagnosis in 300 cases.Prenat Diagn. 2009; 29: 29-39Crossref PubMed Scopus (177) Google Scholar The increased yield of copy number variants detected by aCGH has led to the clinical uptake of the technology; it is commonly used in postnatal diagnostics and increasingly in prenatal settings for certain clinical indications. It has been recommended as a first-line test in individuals with developmental delay, autism spectrum disorder, or multiple congenital anomalies.11Miller D.T. Adam M.P. Aradhya S. Biesecker L.G. Brothman A.R. Carter N.P. Church D.M. Crolla J.A. Eichler E.E. Epstein C.J. Faucett W.A. Feuk L. Friedman J.M. Hamosh A. Jackson L. Kaminsky E.B. Kok K. Krantz I.D. Kuhn R.M. Lee C. Ostell J.M. Rosenberg C. Scherer S.W. Spinner N.B. Stavropoulos D.J. Tepperberg J.H. Thorland E.C. Vermeesch J.R. Waggoner D.J. Watson M.S. Martin C.L. Ledbetter D.H. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies.Am J Hum Genet. 2010; 86: 749-764Abstract Full Text Full Text PDF PubMed Scopus (1948) Google Scholar Expanded use of aCGH has also led to the discovery of polymorphic copy number variable (CNV) regions throughout the human genome.12Redon R. Ishikawa S. Fitch K.R. Feuk L. Perry G.H. Andrews T.D. et al.Global variation in copy number in the human genome.Nature. 2006; 444: 444-454Crossref PubMed Scopus (3310) Google Scholar A variety of publicly available databases cataloguing CNVs have been developed (eg, the Database of Genomic Variants).13Iafrate A.J. Feuk L. Rivera M.N. Listewnik M.L. Donahoe P.K. Qi Y. Scherer S.W. Lee C. Detection of large-scale variation in the human genome.Nat Genet. 2004; 36: 949-951Crossref PubMed Scopus (2305) Google Scholar Since aCGH testing identifies CNVs of all sizes throughout the genome, rather than only large aberrations (G-banded karyotyping), or at well-characterized, pathogenic loci (FISH), databases of previously discovered CNV loci assist in the interpretation of aCGH results. High-resolution CGH arrays differ from other CGH arrays in that short oligonucleotides rather than longer bacterial artificial chromosomes (BACs) are used for hybridization. The use of shorter but more numerous probes enables the identification of CNVs at greater resolution than can be achieved on BAC arrays. However, each individual oligonucleotide probe yields lower-quality data than a BAC probe owing to the lower specificity and higher potential for noise inherent in probes at this length (25 to 75 bases). Although a single BAC probe is often sufficient to call an aberration, the high-resolution oligo-based arrays need several adjacent probes to confidently identify CNV regions. Thus, oligonucleotide arrays require statistical algorithms to extract results. The aCGH data analysis process uses a sequence of data transformation, data normalization, and data summarization steps. Each of these steps typically involves the definition of algorithm parameters that directly affect the analytical sensitivity and specificity of the aCGH assay. The analysis of array-based copy number data can be complicated by the presence of a common, platform- and method-independent14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar, 15Carter N. Methods and strategies for analyzing copy number variation using DNA microarrays.Nat Genet. 2007; 39: S16-S21Crossref PubMed Scopus (382) Google Scholar, 16Lepretre F. Villenet C. Quief S. Nibourel O. Jacquemin C. Troussard X. Jardin F. Gibson F. Kerckaert J.P. Roumier C. Figeac M. Waved aCGH: to smooth or not to smooth.Nucleic Acids Res. 2010; 38: e94Crossref PubMed Scopus (25) Google Scholar, 17Diskin S.J. Li M. Hou C. Yang S. Glessner J. Hakonarson H. Bucan M. Maris J.M. Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.Nucleic Acids Res. 2008; 36: e126Crossref PubMed Scopus (250) Google Scholar, 18Song J.S. Johnson W.E. Zhu X. Zhang X. Li W. Manrai A.K. Liu J.S. Chen R. Liu X.S. Model-based analysis of two-color arrays (MA2C).Genome Biol. 2007; 8: R178Crossref PubMed Scopus (90) Google Scholar, 19Nannya Y. Sanada M. Nakazaki K. Hosoya N. Wang L. Hangaishi A. Kurokawa M. Chiba S. Bailey D.K. Kennedy G.C. Ogawa S. A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.Cancer Res. 2005; 14: 6071-6079Crossref Scopus (564) Google Scholar artifact that has been described recently as GC-waves.14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar The GC-wave phenomenon is the correlation of the deviation in magnitude and direction of log-ratio values from an expected two-copy baseline with genomic GC-content in aCGH and single-nucleotide polymorphism array-based copy number data. Because there are often several consecutive probes in GC- and AT-rich regions of the genome, the moving average through the probes appears as a wavelike pattern. GC-waves add large-scale variability to the probe signal14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar and interfere with data analysis algorithms as they skew probe data away from expected values. The GC-wave artifact increases the potential for false-positive aberration calls in specific genomic regions and can obscure true aberration calls. Investigation into possible causes of the wave effect have focused on biochemical causes, such as DNA quality,17Diskin S.J. Li M. Hou C. Yang S. Glessner J. Hakonarson H. Bucan M. Maris J.M. Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.Nucleic Acids Res. 2008; 36: e126Crossref PubMed Scopus (250) Google Scholar, 20van de Wiel M.A. Brosens R. Eilers P.H. Kumps C. Meijer G.A. Menten B. Sistermans E. Speleman F. Timmerman M.E. Ylstra B. Smoothing waves in array CGH tumor profiles.Bioinformatics. 2009; 9: 1099-1104Crossref Scopus (65) Google Scholar DNA isolation protocols, and labeling procedures,20van de Wiel M.A. Brosens R. Eilers P.H. Kumps C. Meijer G.A. Menten B. Sistermans E. Speleman F. Timmerman M.E. Ylstra B. Smoothing waves in array CGH tumor profiles.Bioinformatics. 2009; 9: 1099-1104Crossref Scopus (65) Google Scholar as well as on genomic features, such as regional GC and gene content.17Diskin S.J. Li M. Hou C. Yang S. Glessner J. Hakonarson H. Bucan M. Maris J.M. Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.Nucleic Acids Res. 2008; 36: e126Crossref PubMed Scopus (250) Google Scholar The causes of GC-waves are not completely understood and are probably multifaceted. A few computational methods for addressing GC-waves have been published. Marioni et al14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar concluded that the wave effect strongly correlates with the GC-content of the probe, and they developed a correction method based on lowess regression to improve CNV calling accuracy for small regions. However, the method is not applicable in the presence of larger aberrations. Large aberrations are often seen in cases of developmental delay, autism spectrum disorder, and multiple congenital anomalies. Nannya et al19Nannya Y. Sanada M. Nakazaki K. Hosoya N. Wang L. Hangaishi A. Kurokawa M. Chiba S. Bailey D.K. Kennedy G.C. Ogawa S. A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.Cancer Res. 2005; 14: 6071-6079Crossref Scopus (564) Google Scholar considered the GC-content of the DNA fragments hybridizing to the array and their size when developing a quadratic regression method for single-nucleotide polymorphism arrays (Affymetrix Inc., Santa Clara, CA). Alternatively, van de Wiel et al20van de Wiel M.A. Brosens R. Eilers P.H. Kumps C. Meijer G.A. Menten B. Sistermans E. Speleman F. Timmerman M.E. Ylstra B. Smoothing waves in array CGH tumor profiles.Bioinformatics. 2009; 9: 1099-1104Crossref Scopus (65) Google Scholar created a set of calibration profiles from a subset of previous aCGH results to reduce the GC-waves in data from tumor samples based on ridge regression. Each of these methods is effective in reducing GC-wave patterns in some capacity, but these approaches generally require some previous understanding of expected array results and can lead to a loss of aberration detection sensitivity.14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar, 17Diskin S.J. Li M. Hou C. Yang S. Glessner J. Hakonarson H. Bucan M. Maris J.M. Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.Nucleic Acids Res. 2008; 36: e126Crossref PubMed Scopus (250) Google Scholar, 20van de Wiel M.A. Brosens R. Eilers P.H. Kumps C. Meijer G.A. Menten B. Sistermans E. Speleman F. Timmerman M.E. Ylstra B. Smoothing waves in array CGH tumor profiles.Bioinformatics. 2009; 9: 1099-1104Crossref Scopus (65) Google Scholar In the clinical setting, it is preferable for a GC-correction algorithm to perform across a broad range of aberration sizes without previous information about the location of aberrations. We developed a two-part GC-wave correction algorithm, CGH slope and anchored median (cghSAM), which is designed to work with all current array-based copy number platforms. The algorithm normalizes signal log-ratios according to individual probe GC-content and then adjusts a subset of chromosomes based on the observed median signal deviations. The two-step correction accounts for the effect of GC-content on each probe by using a chromosome-specific regression and then targets the chromosomes that are most susceptible to GC bias for further correction. To evaluate the performance of cghSAM, microarray data generated from 215 human blood samples were analyzed using the correction. We quantified improvements in the overall array performance by analyzing probe distribution before and after correction with cghSAM. The reduction in wave amplitude increased the sensitivity of the assay in the detection of small deletions, as measured using common copy number polymorphic loci. Analytical specificity was simultaneously increased, improving overall CNV calling accuracy. All the data were derived from patient specimens referred to Esoterix Genetic Laboratories LLC (Westborough, MA) for clinical aCGH analysis and were made fully anonymous before the study. This research study was limited to the use of existing data that were made fully anonymous and is exempt under 45 CFR 46.101(b)(4) as defined by the Office of Human Research Protections, US Department of Health and Human Services. A custom CGH array (four arrays per 1 × 3-inch slide, each array containing 44,000 oligonucleotide probes) (Agilent Technologies Inc., Santa Clara, CA) was designed to target 140 known disease-causing regions of the human genome in addition to the subtelomeric and pericentromeric regions of each chromosome. Approximately 500,000 probes were selected from Agilent Technologies Inc.'s high-definition CGH probe database or, in genomic regions where Agilent Technologies Inc. had not designed probes owing to base composition, were designed by Esoterix Genetic Laboratories LLC and were tested for performance in euploid and aneuploid specimens. Empirical data were generated on a sufficient probe pool to enable 10X down sampling of the best-performing probes to compose the final array. The oligonucleotide probes used were 60 bases in length and were spotted once on the array. The feature layout on the array was randomized to prevent positional effects, and the Agilent Technologies Inc.–recommended replicate probe group and normalization controls were included on the array. Probes were distributed to afford maximum aberration detection resolution in regions of known clinical significance (mean probe spacing, 7500 bases). The remainder of the genome was apportioned a lower probe density (mean probe spacing, 125,000 bases). Genomic DNA was extracted from whole blood using the QIAamp 96 DNA blood kit (Qiagen Inc., Valencia, CA). All aCGH analyses were performed against reference DNA pooled from six phenotypically normal males or females (Promega Corp., Madison, WI). Samples were sex-matched with the reference pool, except in cases where a sex mismatch was experimentally appropriate. The procedures for digestion, labeling, purification, and hybridization were performed in accordance with the manufacturer's protocols. After hybridization, the array slides were washed using a Little Dipper wash station (SciGene Corp., Sunnyvale, CA) and were scanned in a microarray scanner (G2505B; Agilent Technologies Inc.). The aCGH results were analyzed using the Feature Extraction (version 9.5.3) and DNA Analytics (version 4.0.76) software packages (Agilent Technologies Inc.) with the following settings: centralization threshold 6 and bin size 10, 7 probes and 0.40 log-ratio minimums, fuzzy zero not applied.21Agilent oligonucleotide array-based CGH for genomic DNA analysis CGH protocol version 4.0.Agilent Technologies. 2006; Google Scholar Aberrations were called in DNA Analytics using the aberration detection method 2 (ADM2) aberration detection algorithm and a score threshold of either 12.9 (before correction) or 10.4 (after correction by cghSAM). We developed a two-step data correction algorithm (cghSAM) that uses normalized log-ratio signal intensities from two DNA samples involved in comparative hybridization. Step 1 is a data slope correction that is based on robust linear regression of log-ratio signal intensities to the probe GC-content. Step 2 is a chromosome-based normalization of the residual log-ratio bias based on historical chromosomal signal ratio medians. The algorithm fits a robust linear regression to the probe log-ratio values plotted against probe GC-content. It uses the slope of the linear regression (GC slope) to correct individual probe log-ratios. Some of the previously published algorithms used the slope of the genome-wide GC-content linear regression as part of GC correction.14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar, 17Diskin S.J. Li M. Hou C. Yang S. Glessner J. Hakonarson H. Bucan M. Maris J.M. Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.Nucleic Acids Res. 2008; 36: e126Crossref PubMed Scopus (250) Google Scholar, 19Nannya Y. Sanada M. Nakazaki K. Hosoya N. Wang L. Hangaishi A. Kurokawa M. Chiba S. Bailey D.K. Kennedy G.C. Ogawa S. A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.Cancer Res. 2005; 14: 6071-6079Crossref Scopus (564) Google Scholar We observed a similar trend in the data but also found that the GC slope varied widely depending on which chromosome the probe was mapped (Figure 1; see also Supplemental Figure S1 at http://jmd.amjpathol.org). To avoid overcorrection, cghSAM was designed to correct probe log-ratios on a per-chromosome basis. The algorithm sorts the probe data by chromosome and uses robust regression to derive the chromosome-specific linear regression slope and the y-intercept estimate for each of the 24 chromosomes. The algorithm then calculates the median GC-content percentage of all the probes on a chromosome and uses the regression slope of that chromosome and the y-intercept to derive the log-ratio baseline for that chromosome (Figure 2A). The correction factor for probe i on chromosome c is defined as the difference between the calculated value and the log-ratio baseline (equation 1) CorrectionFactori=LRBaselinec−mc×PercentGCi−bc(1) where m is the regression slope of chromosome c and b is the y-intercept for that chromosome.Figure 2cghSAM correction strategy. A: Step 1 correction: the cghSAM algorithm calculates the median GC percentage for all probes on each chromosome. The log-ratio baseline for a particular chromosome is determined using its slope and intercept. Individual probe log-ratios are adjusted by their correction factor (the difference between the solid and dashed lines). B: Step 2 correction: some chromosome medians may be skewed above or below the baseline owing to GC-content. Chromosomal adjustment is used to correct the bias.View Large Image Figure ViewerDownload Hi-res image Download (PPT) Each probe is adjusted by its correction factor, and the median log-ratio of the corrected chromosome is stored for further calculations. After this correction, a subset of nonaberrant chromosomes that consistently skew above or below the expected baseline throughout the data set may require further correction (see step 2) (Figure 2B). Slope correction alone can be insufficient to fully normalize aCGH data. To target the genomic regions most affected by GC-waves, we designed a second step that adjusts the most consistently skewed chromosomes. Using a subset of the microarray data, we analyzed the medians of the 22 autosomes after step 1 correction. A subset of chromosomes had medians that were consistently skewed from the expected diploid baseline when two copies were present and that required further adjustment. This adjustment is designed to minimize the offset from the expected baseline seen in Figure 2B. However, automated adjustments of signal data for individual chromosomes inadvertently fail to detect large aberrations.14Marioni J.C. Thorne N.P. Valsesia A. Fitzgerald T. Redon R. Fiegler H. Andrews T.D. Stranger B.E. Lynch A.G. Dermitzakis E.T. Carter N.P. Tavare S. Hurles M.E. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.Genome Biol. 2007; 8: R228Crossref PubMed Scopus (104) Google Scholar, 17Diskin S.J. Li M. Hou C. Yang S. Glessner J. Hakonarson H. Bucan M. Maris J.M. Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.Nucleic Acids Res. 2008; 36: e126Crossref PubMed Scopus (250) Google Scholar, 20van de Wiel M.A. Brosens R. Eilers P.H. Kumps C. Meijer G.A. Menten B. Sistermans E. Speleman F. Timmerman M.E. Ylstra B. Smoothing waves in array CGH tumor profiles.Bioinformatics. 2009; 9: 1099-1104Crossref Scopus (65) Google Scholar In cghSAM, only chromosomes that require a second adjustment step would be susceptible to this problem, as the first step corrects by slope only, without taking intercept into account. To prevent overcorrection, cghSAM uses mathematical safeguards to avoid overnormalization in truly aberrant regions by ensuring that any adjustment made falls within the expected range of adjustment for a nonaberrant sample. For the subset of chromosomes that consistently required adjustment after the step 1 correction, henceforth known as anchor chromosomes, we defined a typical pattern of median log-ratios between these chromosomes. The selected chromosomes formed anchor set A with anchor values a1, …,aN. Since any of the anchor chromosomes could potentially contain aberrant regions in a given sample, we chose to eliminate a subset of anchor chromosomes that were most likely to harbor a copy number change based on sample-specific data (Figure 3). For a given array, after step 1 correction, the algorithm defines a set of the step 1–corrected medians of anchor chromosomes m1,…,mN. To remove outliers or chromosomes potentially harboring large copy number alterations, such as trisomies or large segmental deletions, the algorithm recursively searches set A for the chromosomes that possess signal medians most dissimilar from the anchor pattern. Every chromosome in set A is consecutively skipped in minimization of the sum in equation 2). The smallest sum is recorded, and the chromosome that was omitted in calculation of the minimal sum is designated as an outlier and removed from further anchor calculations, although all the anchor chromosomes are ultimately corrected. min⁡e∑j=1N(aj−e⋅mj)2(2) The process is repeated until a predetermined number of chromosomes in set A have been designated as outliers (see Results). After all the outliers are removed and M chromosomes remain in the anchor set, cghSAM calculates the coefficient e* such that the difference between the remaining anchors' medians, a1, …,aM, and the rescaled medians, m1, …,mM, is minimized (equation 3)). The chromosomal adjustment factors for the array are then defined for all anchor chromosomes as aj/e*, including the previously removed anchor chromosomes. The log-ratio signal intensities for all the chromosomes in the original set A are corrected by subtracting their respective chromosomal adjustment factors. e*=arg min⁡∑j=1M(aj−e⋅mj)2(3) After development, the cghSAM algorithm was implemented in Matlab (The MathWorks Inc., Natick, MA) and was tested on the custom microarray. To evaluate algorithm performance, we compared aberration calling sensitivity and specificity before and after algorithm correction on 215 arrays. We compared probe-by-probe log-r

Referência(s)