Artigo Acesso aberto Revisado por pares

Design and Multiseries Validation of a Web-Based Gene Expression Assay for Predicting Breast Cancer Recurrence and Patient Survival

2011; Elsevier BV; Volume: 13; Issue: 3 Linguagem: Inglês

10.1016/j.jmoldx.2010.12.003

ISSN

1943-7811

Autores

Ryan K. van Laar,

Tópico(s)

Bioinformatics and Genomic Networks

Resumo

Gene expression analysis is a valuable tool for determining the risk of disease recurrence and overall survival of an individual patient with breast cancer. The purpose of this study was to create and validate a robust prognostic algorithm and implement it within an online analysis environment. Genomic and clinical data from 477 clinically diverse patients with breast cancer were analyzed with Cox regression models to identify genes associated with outcome, independent of standard prognostic factors. Percentile-ranked expression data were used to train a "metagene" algorithm to stratify patients as having a high or low risk of recurrence. The classifier was applied to 1016 patients from five independent series. The 200-gene algorithm stratifies patients into risk groups with statistically and clinically significant differences in recurrence-free and overall survival. Multivariate analysis revealed the classifier to be the strongest predictor of outcome in each validation series. In untreated node-negative patients, 88% sensitivity and 44% specificity for 10-year recurrence-free survival was observed, with positive and negative predictive values of 32% and 92%, respectively. High-risk patients appear to significantly benefit from systemic adjuvant therapy. A 200-gene prognosis signature has been developed and validated using genomic and clinical data representing a range of breast cancer clinicopathological subtypes. It is a strong independent predictor of patient outcome and is available for research use. Gene expression analysis is a valuable tool for determining the risk of disease recurrence and overall survival of an individual patient with breast cancer. The purpose of this study was to create and validate a robust prognostic algorithm and implement it within an online analysis environment. Genomic and clinical data from 477 clinically diverse patients with breast cancer were analyzed with Cox regression models to identify genes associated with outcome, independent of standard prognostic factors. Percentile-ranked expression data were used to train a "metagene" algorithm to stratify patients as having a high or low risk of recurrence. The classifier was applied to 1016 patients from five independent series. The 200-gene algorithm stratifies patients into risk groups with statistically and clinically significant differences in recurrence-free and overall survival. Multivariate analysis revealed the classifier to be the strongest predictor of outcome in each validation series. In untreated node-negative patients, 88% sensitivity and 44% specificity for 10-year recurrence-free survival was observed, with positive and negative predictive values of 32% and 92%, respectively. High-risk patients appear to significantly benefit from systemic adjuvant therapy. A 200-gene prognosis signature has been developed and validated using genomic and clinical data representing a range of breast cancer clinicopathological subtypes. It is a strong independent predictor of patient outcome and is available for research use. Genomic profiling is increasingly being incorporated into the clinical management of patients with breast cancer, specifically the use of multigene algorithms to predict an individual patient's risk of disease recurrence, overall survival (OS), and potential benefit from adjuvant therapy.1Harris L. Fritsche H. Mennel R. Norton L. Ravdin P. Taube S. Somerfield M.R. Hayes D.F. Bast Jr, R.C. American Society of Clinical Oncology 2007 update of recommendations for the use of tumor markers in breast cancer.J Clin Oncol. 2007; 25: 5287-5312Crossref PubMed Scopus (1893) Google Scholar Since the advent of high-throughput genomics, multiple gene "signatures" associated with disease progression have been identified, several of which are sold as commercially available diagnostic tests [eg, MammaPrint (Agendia BV, Amsterdam, The Netherlands), OncoType DX (Genomic Health, Redwood City, CA), and MapQuant DX (Ipsogen, Marseille, France)].2Paik S. Shak S. Tang G. Kim C. Baker J. Cronin M. Baehner F.L. Walker M.G. Watson D. Park T. Hiller W. Fisher E.R. Wickerham D.L. Bryant J. Wolmark N. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.N Engl J Med. 2004; 351: 2817-2826Crossref PubMed Scopus (4833) Google Scholar, 3van't Veer L.J. Dai H. van de Vijver M.J. He Y.D. Hart A.A. Mao M. Peterse H.L. van der Kooy K. Marton M.J. Witteveen A.T. Schreiber G.J. Kerkhoven R.M. Roberts C. Linsley P.S. Bernards R. Friend S.H. Gene expression profiling predicts clinical outcome of breast cancer.Nature. 2002; 415: 530-536Crossref PubMed Scopus (7763) Google Scholar, 4Wang Y. Klijn J. Zhang Y. Sieuwerts A. Look M. Yang F. Talantov D. Timmermans M. van Gelder M. Yu J. Jatkoe T. Berns E. Atkins D. Foekens J. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.Lancet. 2005; 365: 671-679Abstract Full Text Full Text PDF PubMed Scopus (1987) Google Scholar, 5Ma X.J. Wang Z. Ryan P.D. Isakoff S.J. Barmettler A. Fuller A. Muir B. Mohapatra G. Salunga R. Tuggle J.T. Tran Y. Tran D. Tassin A. Amon P. Wang W. Enright E. Stecker K. Estepa-Sabal E. Smith B. Younger J. Balis U. Michaelson J. Bhan A. Habin K. Baer T.M. Brugge J. Haber D.A. Erlander M.G. Sgroi D.C. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen.Cancer Cell. 2004; 5: 607-616Abstract Full Text Full Text PDF PubMed Scopus (793) Google Scholar Although there is minimal overlap between the underlying gene sets used,6Ein-Dor L. Kela I. Getz G. Givol D. Domany E. Outcome signature genes in breast cancer: is there a unique set?.Bioinformatics. 2005; 21: 171-178Crossref PubMed Scopus (619) Google Scholar, 7Abraham G. Kowalczyk A. Loi S. Haviv I. Zobel J. Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context.BMC Bioinformatics. 2010; 11: 277Crossref PubMed Scopus (69) Google Scholar the clinical significance of each is similar.8Haibe-Kains B. Desmedt C. Piette F. Buyse M. Cardoso F. van't Veer L. Piccart M. Bontempi G. Sotiriou C. Comparison of prognostic gene expression signatures for breast cancer.BMC Genomics. 2008; 9: 394Crossref PubMed Scopus (112) Google Scholar For a patient predicted to be at low risk of recurrence, the risks and adverse effects of chemotherapy negate the small potential increase in recurrence-free survival (RFS) probability.8Haibe-Kains B. Desmedt C. Piette F. Buyse M. Cardoso F. van't Veer L. Piccart M. Bontempi G. Sotiriou C. Comparison of prognostic gene expression signatures for breast cancer.BMC Genomics. 2008; 9: 394Crossref PubMed Scopus (112) Google Scholar Prospective clinical trials9Cardoso F. Piccart-Gebhart M. Van't Veer L. Rutgers E. The MINDACT trial: the first prospective clinical validation of a genomic tool.Mol Oncol. 2007; 1: 246-251Abstract Full Text Full Text PDF PubMed Scopus (102) Google Scholar are under way to further test this hypothesis. In addition to the clinical significance of the available assays, another similarity is their mode of delivery: a centralized sample processing and gene expression analysis laboratory. This format requires the shipping of tumor tissue and a wait of up to 2 weeks for the generation of a result. The hypothesis behind this study was that a prognostic breast cancer signature could be developed using publicly available data sets and ultimately made available to appropriately equipped and experienced diagnostic laboratories, via the Internet. Although this model would require the treating hospital to generate the gene expression profile, the potential cost and time savings would be substantial. In addition, this model would allow the clinician to remain in control of the biopsy material, all data generated, and the overall diagnostic process. A similar approach has been taken to develop a prognostic signature for patients with stage II or III colon cancer.10Van Laar R.K. An online gene expression assay for determining adjuvant therapy eligibility in patients with stage 2 or 3 colon cancer.Br J Cancer. 2010; 103: 1852-1857Crossref PubMed Scopus (22) Google Scholar A training series of genomic data from patients with breast cancer with diverse clinicopathological variables was compiled from public data repositories. This information was analyzed using a statistical approach designed to identify individual genes associated with recurrence, independent of other prognostic factors. A predictive algorithm was formed on the gene set identified by this strategy and then applied to multiple independent breast cancer series, representing a range of clinicopathological variables. The resulting algorithm outputs a robust easily interpretable prognostic index and risk group assignment. Prognostic indexes and risk group assignments for all patients were evaluated in the context of the available clinical and survival data to assess clinical utility. The algorithm is implemented in an online diagnostic environment (ChipDX), available for evaluation (http://www.ChipDX.com, last accessed March 3, 2011; free registration is required). Gene expression and clinical data from two previously described cohorts were compiled to form a gene-selection and algorithm training series of 477 patients [National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) GSE492211Loi S. Haibe-Kains B. Desmedt C. Lallemand F. Tutt A.M. Gillet C. Ellis P. Harris A. Bergh J. Foekens J.A. Klijn J.G.M. Larsimont D. Buyse M. Bontempi G. Delorenzi M. Piccart M.J. Sotiriou C. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.J Clin Oncol. 2007; 25: 1239-1246Crossref PubMed Scopus (658) Google Scholar and GSE653212Ivshina A.V. George J. Senko O. Mow B. Putti T.C. Smeds J. Lindahl T. Pawitan Y. Hall P. Nordgren H. Wong J.E.L. Liu E.T. Bergh J. Kuznetsov V.A. Miller L.D. Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer.Cancer Res. 2006; 66: 10292-10301Crossref PubMed Scopus (560) Google Scholar]. Of the patients, 164 (34%) did not receive adjuvant treatment of any kind, whereas 205 (43%) received endocrine therapy only. Local or distant recurrence was defined as the clinical end point. The institutional ethics board of each hospital approved the use of the tissue material, and written informed consent was obtained. Relevant molecular and clinical variables for the training series are shown in Table 1.Table 1Details of the 477-Patient Training Series Used for Gene Selection and Algorithm TrainingCharacteristicNo. of patients% of SeriesTumor size (cm) 592 NA411Age (years) <509720 ≥5033169 NA4910ER status Positive35875 Negative11925Nodal involvement Positive12827 Negative34973Tumor grade 1 (low)9319 2 (moderate)23449 3 (high)7215 NA781610-year disease recurrence No32769 Yes15031Adjuvant therapy None16434 Endocrine only20543 Systemic treatment10823The clinicopathological characteristics of the five independent validation series can be found in the referenced publications and in Supplemental Tables S1-S5 (available at http://jmd.amjpathol.org).NA, not available. Open table in a new tab The clinicopathological characteristics of the five independent validation series can be found in the referenced publications and in Supplemental Tables S1-S5 (available at http://jmd.amjpathol.org). NA, not available. To test the performance of the classifier on patients who were not involved in gene selection and the algorithm training process, five additional series11Loi S. Haibe-Kains B. Desmedt C. Lallemand F. Tutt A.M. Gillet C. Ellis P. Harris A. Bergh J. Foekens J.A. Klijn J.G.M. Larsimont D. Buyse M. Bontempi G. Delorenzi M. Piccart M.J. Sotiriou C. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.J Clin Oncol. 2007; 25: 1239-1246Crossref PubMed Scopus (658) Google Scholar, 13Desmedt C. Piette F. Loi S. Wang Y. Lallemand F.O. Haibe-Kains B. Viale G. Delorenzi M. Zhang Y. d'Assignies M.S. Bergh J. Lidereau R. Ellis P. Harris A.L. Klijn J.G.M. Foekens J.A. Cardoso F. Piccart M.J. Buyse M. Sotiriou C. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.Clin Cancer Res. 2007; 13: 3207-3214Crossref PubMed Scopus (754) Google Scholar, 14Schmidt M. Bohm D. von Torne C. Steiner E. Puhl A. Pilch H. Lehr H.-A. Hengstler J.G. Kolbl H. Gehrmann M. The humoral immune system has a key prognostic impact in node-negative breast cancer.Cancer Res. 2008; 68: 5405-5413Crossref PubMed Scopus (617) Google Scholar, 15Pawitan Y. Bjohle J. Amler L. Borg A.L. Egyhazi S. Hall P. Han X. Holmberg L. Huang F. Klaar S. Liu E.T. Miller L. Nordgren H. Ploner A. Sandelin K. Shaw P.M. Smeds J. Skoog L. Wedren S. Bergh J. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts.Breast Cancer Res. 2005; 7: R953-R964Crossref PubMed Scopus (617) Google Scholar, 16van de Vijver M.J. He Y.D. van'T Veer L.J. Dai H. Hart A.A. Voskuil D.W. Schreiber G.J. Peterse J.L. Roberts C. Marton M.J. Parrish M. Atsma D. Witteveen A. Glas A. Delahaye L. van der Velde T. Bartelink H. Rodenhuis S. Rutgers E.T. Friend S.H. Bernards R. A gene-expression signature as a predictor of survival in breast cancer.N Engl J Med. 2002; 347: 1999-2009Crossref PubMed Scopus (5257) Google Scholar of genomic profiles were obtained, totaling 1016 patients. Selection criteria for each series are summarized in Table 2 and described in detail in the original publications associated with each series. Patients from validation series 1, 2, and 5 did not receive adjuvant therapy. Of 159 patients, 126 (79%) in validation series 3 received adjuvant systemic therapy, although patient-level treatment data for this series were not available. All patients in validation series 4 were estrogen receptor (ER) positive and received adjuvant hormonal therapy. Clinicopathological tables for each of the five validation series are provided in Supplemental Tables S1-S5 (available at http://jmd.amjpathol.org).Table 2Training and Validation Series Details, Group Size, Demographic Description, and Multivariate Cox Proportional Hazards Analysis Output (DMFS and OS)Data were obtained from Rosetta Inpharmatics, Seattle, WA (http://www.rii.com/publications/2002/nejm.html, last accessed March 3, 2011).SeriesDescriptionCovariateDMFS P value⁎Value for RFS in the training series and for DMFS in the validation 1, 3, 4, and 5 series.DMFS HR (95% CI)OS P value†Value for OS in the validation 1 and 5 series and DSS in the validation 3 series. OS details for other series not available.OS HR (95% CI)Training: GSE4922, Ivshina et al,12Ivshina A.V. George J. Senko O. Mow B. Putti T.C. Smeds J. Lindahl T. Pawitan Y. Hall P. Nordgren H. Wong J.E.L. Liu E.T. Bergh J. Kuznetsov V.A. Miller L.D. Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer.Cancer Res. 2006; 66: 10292-10301Crossref PubMed Scopus (560) Google Scholar; GSE6532, Loi et al11Loi S. Haibe-Kains B. Desmedt C. Lallemand F. Tutt A.M. Gillet C. Ellis P. Harris A. Bergh J. Foekens J.A. Klijn J.G.M. Larsimont D. Buyse M. Bontempi G. Delorenzi M. Piccart M.J. Sotiriou C. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.J Clin Oncol. 2007; 25: 1239-1246Crossref PubMed Scopus (658) Google Scholar (N = 477)ER+/ER-, N0/N1, systemic therapy, tamoxifen only, or no adjuvant therapyAge0.421.01 (0.99–1.02)ER+0.581.18 (0.65–2.16)Grade0.0591.40 (0.99–1.97)Size0.101.01 (1.00–1.02)Node+0.0001‡Statistically significant variables within each CPH model.2.79 (1.67–4.66)‡Statistically significant variables within each CPH model.Endocrine therapy0.280.73 (0.42–1.28)Chemotherapy0.0032‡Statistically significant variables within each CPH model.0.35 (0.18–0.70)‡Statistically significant variables within each CPH model.200-Gene signature0.0001‡Statistically significant variables within each CPH model.3.14 (1.80–5.49)‡Statistically significant variables within each CPH model.Validation 1: GSE7390, Desmedt et al13Desmedt C. Piette F. Loi S. Wang Y. Lallemand F.O. Haibe-Kains B. Viale G. Delorenzi M. Zhang Y. d'Assignies M.S. Bergh J. Lidereau R. Ellis P. Harris A.L. Klijn J.G.M. Foekens J.A. Cardoso F. Piccart M.J. Buyse M. Sotiriou C. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.Clin Cancer Res. 2007; 13: 3207-3214Crossref PubMed Scopus (754) Google Scholar (N = 198)ER+/–, N0, <61 years, untreated, ≤5 cmAge0.351.022 (0.98–1.07)0.461.02 (0.97–1.06)ER+0.540.81 (0.40–1.62)0.033‡Statistically significant variables within each CPH model.0.48 (0.25–0.94)‡Statistically significant variables within each CPH model.Grade0.731.11 (0.63–1.95)0.230.74 (0.45–1.21)Size0.0921.35 (0.95–1.92)0.0741.35 (0.97–1.87)200-Gene signature0.0046‡Statistically significant variables within each CPH model.4.37 (1.58–12.08)‡Statistically significant variables within each CPH model.0.0053‡Statistically significant variables within each CPH model.3.31 (1.43–7.64)‡Statistically significant variables within each CPH model.Validation 2: GSE11121, Schmidt et al14Schmidt M. Bohm D. von Torne C. Steiner E. Puhl A. Pilch H. Lehr H.-A. Hengstler J.G. Kolbl H. Gehrmann M. The humoral immune system has a key prognostic impact in node-negative breast cancer.Cancer Res. 2008; 68: 5405-5413Crossref PubMed Scopus (617) Google Scholar (N = 200)ER+/–, untreated, population based, N0Grade0.033‡Statistically significant variables within each CPH model.1.93 (1.057–3.51)‡Statistically significant variables within each CPH model.Size0.791.044 (0.75–1.45)200-Gene signature0.0562.63 (0.98–7.055)Validation 3: GSE1456, Pawitan et al15Pawitan Y. Bjohle J. Amler L. Borg A.L. Egyhazi S. Hall P. Han X. Holmberg L. Huang F. Klaar S. Liu E.T. Miller L. Nordgren H. Ploner A. Sandelin K. Shaw P.M. Smeds J. Skoog L. Wedren S. Bergh J. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts.Breast Cancer Res. 2005; 7: R953-R964Crossref PubMed Scopus (617) Google Scholar (N = 159)ER+/–, population based, 126 received adjuvant therapyGrade0.191.47 (0.83–2.64)0.341.40 (0.70–2.80)200-Gene signature0.0552.58 (0.98–6.67)0.025‡Statistically significant variables within each CPH model.4.67 (1.23–17.81)‡Statistically significant variables within each CPH model.Validation 4: GSE9195 and GSE6532, Loi et al11Loi S. Haibe-Kains B. Desmedt C. Lallemand F. Tutt A.M. Gillet C. Ellis P. Harris A. Bergh J. Foekens J.A. Klijn J.G.M. Larsimont D. Buyse M. Bontempi G. Delorenzi M. Piccart M.J. Sotiriou C. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.J Clin Oncol. 2007; 25: 1239-1246Crossref PubMed Scopus (658) Google Scholar (N = 128)ER+, adjuvant tamoxifen treated, N0/N1, ≤5 cmAge0.220.97 (0.93–1.019)Grade0.740.89 (0.46–1.72)Nodes0.940.96 (0.38–2.38)Size0.0075‡Statistically significant variables within each CPH model.1.49 (1.11–1.98)‡Statistically significant variables within each CPH model.200-Gene signature0.019‡Statistically significant variables within each CPH model.6.51 (1.37–30.86)‡Statistically significant variables within each CPH model.Validation 5: NKI 295, Van De Vijver et al16van de Vijver M.J. He Y.D. van'T Veer L.J. Dai H. Hart A.A. Voskuil D.W. Schreiber G.J. Peterse J.L. Roberts C. Marton M.J. Parrish M. Atsma D. Witteveen A. Glas A. Delahaye L. van der Velde T. Bartelink H. Rodenhuis S. Rutgers E.T. Friend S.H. Bernards R. A gene-expression signature as a predictor of survival in breast cancer.N Engl J Med. 2002; 347: 1999-2009Crossref PubMed Scopus (5257) Google Scholar (N = 295)§Validation series 5 was generated using a custom oligonucleotide microarray containing 99 of the 200 genes used by the classifier.ER+/–, untreated, stage I/ II, <53 years; N0/N1ER+0.180.74 (0.47–1.16)0.0570.51 (0.32–0.82)Node+0.390.84 (0.56–1.25)0.630.90 (0.57–1.40)99-Gene signature<0.0001‡Statistically significant variables within each CPH model.2.92 (1.77–4.80)‡Statistically significant variables within each CPH model.<0.0001‡Statistically significant variables within each CPH model.3.91 (2.06–7.42)‡Statistically significant variables within each CPH model.GSE numbers are the National Center for Biotechnology Information Gene Expression Omnibus entry for each series.GSE, GEO Series; N0, lymph node negative; N1, lymph node positive; NKI, Nederlands Kanker Instituut (Netherlands Cancer Institute). Value for RFS in the training series and for DMFS in the validation 1, 3, 4, and 5 series.† Value for OS in the validation 1 and 5 series and DSS in the validation 3 series. OS details for other series not available.‡ Statistically significant variables within each CPH model.§ Validation series 5 was generated using a custom oligonucleotide microarray containing 99 of the 200 genes used by the classifier. Open table in a new tab GSE numbers are the National Center for Biotechnology Information Gene Expression Omnibus entry for each series. GSE, GEO Series; N0, lymph node negative; N1, lymph node positive; NKI, Nederlands Kanker Instituut (Netherlands Cancer Institute). Training and validation series 1 through 4 were generated using a platform (Affymetrix GeneChip) with a chip (U133a or U133 Plus 2.0). Sample processing and hybridization (GeneChip) was performed according to manufacturer recommendations, as reported by the respective original publications.11Loi S. Haibe-Kains B. Desmedt C. Lallemand F. Tutt A.M. Gillet C. Ellis P. Harris A. Bergh J. Foekens J.A. Klijn J.G.M. Larsimont D. Buyse M. Bontempi G. Delorenzi M. Piccart M.J. Sotiriou C. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.J Clin Oncol. 2007; 25: 1239-1246Crossref PubMed Scopus (658) Google Scholar, 12Ivshina A.V. George J. Senko O. Mow B. Putti T.C. Smeds J. Lindahl T. Pawitan Y. Hall P. Nordgren H. Wong J.E.L. Liu E.T. Bergh J. Kuznetsov V.A. Miller L.D. Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer.Cancer Res. 2006; 66: 10292-10301Crossref PubMed Scopus (560) Google Scholar, 13Desmedt C. Piette F. Loi S. Wang Y. Lallemand F.O. Haibe-Kains B. Viale G. Delorenzi M. Zhang Y. d'Assignies M.S. Bergh J. Lidereau R. Ellis P. Harris A.L. Klijn J.G.M. Foekens J.A. Cardoso F. Piccart M.J. Buyse M. Sotiriou C. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.Clin Cancer Res. 2007; 13: 3207-3214Crossref PubMed Scopus (754) Google Scholar, 14Schmidt M. Bohm D. von Torne C. Steiner E. Puhl A. Pilch H. Lehr H.-A. Hengstler J.G. Kolbl H. Gehrmann M. The humoral immune system has a key prognostic impact in node-negative breast cancer.Cancer Res. 2008; 68: 5405-5413Crossref PubMed Scopus (617) Google Scholar, 17Chowdary D. Lathrop J. Skelton J. Curtin K. Briggs T. Zhang Y. Yu J. Wang Y. Mazumder A. Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative.J Mol Diagn. 2006; 8: 31-39Abstract Full Text Full Text PDF PubMed Scopus (97) Google Scholar Raw GeneChip output files (CEL files) were processed with the MAS5 method and median centered using the housekeeping/reference gene set (Affymetrix) value, defined by determining the median. Validation series 5 was generated using a custom two-channel oligonucleotide microarray, described by van't Veer et al3van't Veer L.J. Dai H. van de Vijver M.J. He Y.D. Hart A.A. Mao M. Peterse H.L. van der Kooy K. Marton M.J. Witteveen A.T. Schreiber G.J. Kerkhoven R.M. Roberts C. Linsley P.S. Bernards R. Friend S.H. Gene expression profiling predicts clinical outcome of breast cancer.Nature. 2002; 415: 530-536Crossref PubMed Scopus (7763) Google Scholar and van de Vijver et al.16van de Vijver M.J. He Y.D. van'T Veer L.J. Dai H. Hart A.A. Voskuil D.W. Schreiber G.J. Peterse J.L. Roberts C. Marton M.J. Parrish M. Atsma D. Witteveen A. Glas A. Delahaye L. van der Velde T. Bartelink H. Rodenhuis S. Rutgers E.T. Friend S.H. Bernards R. A gene-expression signature as a predictor of survival in breast cancer.N Engl J Med. 2002; 347: 1999-2009Crossref PubMed Scopus (5257) Google Scholar Data from this validation series were downloaded in a normalized log-ratio format. A modified version of the method described by Bair and Tibshirani,18Bair E. Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data.PLoS Biol. 2004; https://doi.org/10.1371/journal.pbio.0020108Crossref PubMed Scopus (510) Google Scholar implemented in a package (BRB ArrayTools),19Simon R. Lam A. Li M.C. Ngan M. Menenzes S. Zhao Y. Analysis of gene expression data using BRB-Array Tools.Cancer Inform. 2007; 3: 11-17Crossref PubMed Google Scholar was used to develop and train a predictive algorithm to stratify patients into categories corresponding to a high or low risk of disease recurrence. This method uses Cox proportional hazards (CPH) models to relate RFS to a specified number of "metagene" expression levels (ie, principal component linear combinations of expression data). To identify a predictive set of genes for use with the classification algorithm, tenfold cross validation (CV) was performed on the training series. At each iteration of the CV process, those genes significantly associated with RFS (P < 0.001) in nine tenths of the training series, independent of age, tumor size, ER status, nodal involvement, and tumor grade, were identified. These genes were used to predict the RFS risk status of the "held-out" one tenth of the training series. An example of the CPH method of gene selection is provided in Supplemental Table S6 (available at http://jmd.amjpathol.org). Genes selected in two or more of the CV rounds were selected for inclusion in the final predictive gene set. To minimize the impact of inter-laboratory "batch effects" on the performance of the classifier, a data standardization method was developed. This involved converting the log 2 intensity values for the final gene set to percentile rank values (ie, 0.00 to 100.00) using the "percentrank" function in Microsoft Excel or the "ecdf" function in R. The metagene classifier was then retrained on the percentile rank values to generate the final classification algorithm. The prognostic index can be computed by the following formula: Σiwixi + C, where wi and xi are the weight and logged gene expression, respectively, for the i gene and C is an adjustment factor calculated during algorithm training to center the distribution of indexes at approximately 0. A high value of the prognostic index corresponds to a high value of hazard of developing distant metastases. The classification threshold was set based on the 33rd percentile of training series prognostic indexes. Because a subset of the training series (164 patients) did not receive any form of adjuvant therapy, the entire gene selection and algorithm training process was repeated using data corresponding to the untreated patients alone. Cross-validated risk-group predictions from the model developed were compared with those generated using the complete series; however, a reduction in cross-validated algorithm performance was observed (data not shown). Thus, the signature generated from analysis of the complete 477-patient series was retained as the final model. Kaplan-Meier analysis and log-rank testing were used to evaluate the RFS, distant metastases–free survival (DMFS), OS, or disease-specific survival (DSS) of predicted risk groups identified within each series. All follow-up data were censored at 10 years. Multivariate CPH analysis of each series was performed, using the available clinical covariates for each (Table 2). In all CPH analyses, the low-risk group was used as the reference group. For all tests, P < 0.05 was considered statistically significant. Assay sensitivity, specificity, and positive predictive values were calculated on validation series 1 and 2 using 10-year censored data. Gene expression analysis was performed using R (http://www.r-project.org, last accessed March 3, 2011), Bioconductor,20Gentleman R.C. Carey V.J. Bates D.M. Bolstad B. Dettling M. Dudoit S. Ellis B. Gautier L. Ge Y. Gentry J. Hornik K. Hothorn T. Huber W. Iacus S. Irizarry R. Leisch F. Li C. Maechler M. Rossini A.J. Sawitzki G. Smith C. Smyth G. Tierney L. Yang J.Y. Zhang J. Bioconductor: open software development for computational biology and bioinformatics.Genome Biol. 2004; 5: R80Crossref PubMed Google Scholar and BRB ArrayTools.21Simon R. Lam A. Li M.-C. Ngan M. Menenzes S. Zhao Y. Analysis of Gene Expression Data Using BRB-Array Tools.Cancer Inform. 2007; 3: 11-17Crossref PubMed Scopus (621) Google Scholar Statistical analyses were performed using MedCalc (MedCalc Inc., Mariakerke, Belgium). An online gene expression analysis system (ChipDX) was developed with R, Bioconductor, Microsoft ASP.net, and SQL Server (Microsoft Corporation, Redmond, WA). From the training series of 477 patients with breast cancer, Cox regression–based gene selection identified a set of 200 genes with univariate prognostic significance (P < 0.001). These genes were associated with RFS, independent of age, tumor grade, ER status, tumor size, and nodal involvement. Of the 200 genes, only three (CKS2, PRC1, and TRIP13; all involve

Referência(s)