Applications of Immunogenomics to Cancer
2017; Cell Press; Volume: 168; Issue: 4 Linguagem: Inglês
10.1016/j.cell.2017.01.014
ISSN1097-4172
AutoresX. Shirley Liu, Elaine R. Mardis,
Tópico(s)vaccines and immunoinformatics approaches
ResumoCancer immunogenomics originally was framed by research supporting the hypothesis that cancer mutations generated novel peptides seen as "non-self" by the immune system. The search for these "neoantigens" has been facilitated by the combination of new sequencing technologies, specialized computational analyses, and HLA binding predictions that evaluate somatic alterations in a cancer genome and interpret their ability to produce an immune-stimulatory peptide. The resulting information can characterize a tumor's neoantigen load, its cadre of infiltrating immune cell types, the T or B cell receptor repertoire, and direct the design of a personalized therapeutic. Cancer immunogenomics originally was framed by research supporting the hypothesis that cancer mutations generated novel peptides seen as "non-self" by the immune system. The search for these "neoantigens" has been facilitated by the combination of new sequencing technologies, specialized computational analyses, and HLA binding predictions that evaluate somatic alterations in a cancer genome and interpret their ability to produce an immune-stimulatory peptide. The resulting information can characterize a tumor's neoantigen load, its cadre of infiltrating immune cell types, the T or B cell receptor repertoire, and direct the design of a personalized therapeutic. The underpinnings of modern immunogenomics resulted from hypotheses generated and tested by visionaries in cancer immunology during the late 1980s through the 1990s. Their central hypothesis was that cancer cells presented novel, tumor-specific (i.e., mutated) peptides on the cancer cell surface bound by the patient's HLA molecules. By virtue of this cell surface presentation, specific T cell immunity might be elicited to these "neoantigens." Supporting evidence for this hypothesis was demonstrated in cancers of non-viral origin (Old and Boyse, 1964Old L.J. Boyse E.A. IMMUNOLOGY OF EXPERIMENTAL TUMORS.Annu. Rev. Med. 1964; 15: 167-186Crossref PubMed Google Scholar, Foley, 1953Foley E.J. Antigenic properties of methylcholanthrene-induced tumors in mice of the strain of origin.Cancer Res. 1953; 13: 835-837PubMed Google Scholar, Prehn and Main, 1957Prehn R.T. Main J.M. Immunity to methylcholanthrene-induced sarcomas.J. Natl. Cancer Inst. 1957; 18: 769-778PubMed Google Scholar). This foundational work led to the identification and characterization of the role of MHC proteins in antigen presentation (Babbitt et al., 1985Babbitt B.P. Allen P.M. Matsueda G. Haber E. Unanue E.R. Binding of immunogenic peptides to Ia histocompatibility molecules.Nature. 1985; 317: 359-361Crossref PubMed Scopus (864) Google Scholar, Bjorkman et al., 1987Bjorkman P.J. Saper M.A. Samraoui B. Bennett W.S. Strominger J.L. Wiley D.C. Structure of the human class I histocompatibility antigen, HLA-A2.Nature. 1987; 329: 506-512Crossref PubMed Scopus (0) Google Scholar). Concomitantly, methods to grow antigen-specific cytolytic T lymphocytes (CTLs) in culture were also developed (Cerottini et al., 1974Cerottini J.C. Engers H.D. Macdonald H.R. Brunner T. Generation of cytotoxic T lymphocytes in vitro. I. Response of normal and immune mouse spleen cells in mixed leukocyte cultures.J. Exp. Med. 1974; 140: 703-717Crossref PubMed Google Scholar, Gillis and Smith, 1977Gillis S. Smith K.A. Long term culture of tumour-specific cytotoxic T cells.Nature. 1977; 268: 154-156Crossref PubMed Google Scholar), as were the molecular biology procedures to clone and express gene products. Thierry Boon's laboratory combined these new methods to identify the first tumor specific antigen (TSA), a point mutation in a protein called P91A (De Plaen et al., 1988De Plaen E. Lurquin C. Van Pel A. Mariamé B. Szikora J.P. Wölfel T. Sibille C. Chomez P. Boon T. Immunogenic (tum-) variants of mouse tumor P815: cloning of the gene of tum- antigen P91A and identification of the tum- mutation.Proc. Natl. Acad. Sci. USA. 1988; 85: 2274-2278Crossref PubMed Scopus (0) Google Scholar). Subsequently, Hans Schreiber's laboratory demonstrated that TSAs also function as neoantigens using primary UV-induced mouse tumors (Monach et al., 1995Monach P.A. Meredith S.C. Siegel C.T. Schreiber H. A unique tumor antigen produced by a single amino acid substitution.Immunity. 1995; 2: 45-59Abstract Full Text PDF PubMed Google Scholar). Similarly, groups studying human melanomas showed they could identify T cells in the peripheral circulation that bind melanoma cells preferentially over normal cells from the same patient (Dubey et al., 1997Dubey P. Hendrickson R.C. Meredith S.C. Siegel C.T. Shabanowitz J. Skipper J.C. Engelhard V.H. Hunt D.F. Schreiber H. The immunodominant antigen of an ultraviolet-induced regressor tumor is generated by a somatic point mutation in the DEAD box helicase p68.J. Exp. Med. 1997; 185: 695-705Crossref PubMed Scopus (0) Google Scholar, Knuth et al., 1984Knuth A. Danowski B. Oettgen H.F. Old L.J. T-cell-mediated cytotoxicity against autologous malignant melanoma: analysis with interleukin 2-dependent T-cell cultures.Proc. Natl. Acad. Sci. USA. 1984; 81: 3511-3515Crossref PubMed Google Scholar, Robbins et al., 1996Robbins P.F. El-Gamil M. Li Y.F. Kawakami Y. Loftus D. Appella E. Rosenberg S.A. A mutated beta-catenin gene encodes a melanoma-specific antigen recognized by tumor infiltrating lymphocytes.J. Exp. Med. 1996; 183: 1185-1192Crossref PubMed Scopus (410) Google Scholar, Van den Eynde et al., 1989Van den Eynde B. Hainaut P. Hérin M. Knuth A. Lemoine C. Weynants P. van der Bruggen P. Fauchet R. Boon T. Presence on a human melanoma of multiple antigens recognized by autologous CTL.Int. J. Cancer. 1989; 44: 634-640Crossref PubMed Google Scholar). Shortly thereafter, Boon's laboratory cloned the first human TSA, called MAGEA1 (van der Bruggen et al., 1991van der Bruggen P. Traversari C. Chomez P. Lurquin C. De Plaen E. Van den Eynde B. Knuth A. Boon T. A gene encoding an antigen recognized by cytolytic T lymphocytes on a human melanoma.Science. 1991; 254: 1643-1647Crossref PubMed Google Scholar), and Sahin's group demonstrated an autologous antibody-based method to clone and identify different human TSAs (Sahin et al., 1995Sahin U. Türeci O. Schmitt H. Cochlovius B. Johannes T. Schmits R. Stenner F. Luo G. Schobert I. Pfreundschuh M. Human neoplasms elicit multiple specific immune responses in the autologous host.Proc. Natl. Acad. Sci. USA. 1995; 92: 11810-11813Crossref PubMed Scopus (906) Google Scholar). While these foundational studies established supporting evidence for the existence of tumor-specific peptide neoantigens, the lengthy and painstaking nature of these processes was unlikely to scale to clinical application for cancer patients. More recently, these limitations have been alleviated by the application of new sequencing technologies and associated computational data analysis approaches. These methods, collectively referred to as "immunogenomics," have improved the facility with which individual cancers can be studied to predict their neoantigens for prognostic purposes or to inform immunotherapeutic interventions. Complementary methods have been developed to study the changes in the T cell repertoire, to characterize the gene expression signatures of the immune cell types present in the tumor mass, and to design personalized vaccines or adoptive cell transfer (ACT) therapies. The now scalable nature of immunogenomic methods should permit their widespread clinical application, although there remain issues and challenges to be resolved. This primer will highlight the specific methods and describe the known strengths and weaknesses in modern immunogenomics. It has long been known that cancer is caused by alterations to genomic DNA that impact protein functions, ultimately disrupting cellular control of pathways and resulting in the outgrowth of a tumor mass. Methods using next generation sequencing platforms generate data from tumor and normal DNA isolates that, once aligned to the Human Reference Genome sequence, can be interpreted to identify somatic alterations (Ley et al., 2008Ley T.J. Mardis E.R. Ding L. Fulton B. McLellan M.D. Chen K. Dooling D. Dunford-Shore B.H. McGrath S. Hickenbotham M. et al.DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.Nature. 2008; 456: 66-72Crossref PubMed Scopus (806) Google Scholar). In practice, such analyses aim to identify DNA alterations in known cancer genes, both oncogenes and tumor suppressors that combine to transform the founder cell. For certain oncogenes, identified mutations indicate therapeutic interventions that may successfully halt the tumor cell growth. By contrast, immunogenomic approaches aim to identify tumor-specific DNA alterations that predict amino acid sequence changes in all encoded proteins, and then evaluate their potential as neoantigens. In practice, most TSAs identified to-date are highly unique to each patient and generally do not involve known cancer genes. Hence, the widespread use of next-generation sequencing (NGS) instrumentation has enabled immunogenomics, providing a facile way to generate data to predict tumor-specific neoantigens in a rapid, inexpensive and comprehensive manner (Gubin et al., 2015Gubin M.M. Artyomov M.N. Mardis E.R. Schreiber R.D. Tumor neoantigens: building a framework for personalized cancer immunotherapy.J. Clin. Invest. 2015; 125: 3413-3421Crossref PubMed Scopus (72) Google Scholar). NGS technologies have rapidly evolved over the past 10 years, resulting in dramatically increased amounts of sequencing data produced per instrument run at ever-decreasing costs (Mardis, 2017Mardis E.R. DNA sequencing technologies: 2006-2016.Nat. Protoc. 2017; 12: 213-218Crossref PubMed Scopus (0) Google Scholar). In immunogenomics, since the focus is protein-coding genes, solution hybridization-based methods are used to select these sequences ("exome") prior to sequencing (Bainbridge et al., 2010Bainbridge M.N. Wang M. Burgess D.L. Kovar C. Rodesch M.J. D'Ascenzo M. Kitzman J. Wu Y.-Q. Newsham I. Richmond T.A. et al.Whole exome capture in solution with 3 Gbp of data.Genome Biol. 2010; 11: R62Crossref PubMed Scopus (108) Google Scholar, Gnirke et al., 2009Gnirke A. Melnikov A. Maguire J. Rogov P. LeProust E.M. Brockman W. Fennell T. Giannoukos G. Fisher S. Russ C. et al.Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.Nat. Biotechnol. 2009; 27: 182-189Crossref PubMed Scopus (736) Google Scholar, Hodges et al., 2009Hodges E. Rooks M. Xuan Z. Bhattacharjee A. Benjamin Gordon D. Brizuela L. Richard McCombie W. Hannon G.J. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing.Nat. Protoc. 2009; 4: 960-974Crossref PubMed Scopus (113) Google Scholar). Importantly, the concomitant development of advanced variant detection algorithms that identify different classes of mutations from NGS data has enabled the identification of all classes of somatic variation. Accurate detection of variants in this setting is influenced by multiple factors, which are presented here in detail. One important consideration for somatic variant detection is depth of coverage by NGS sequencing reads from the tumor. In principle, since tumor samples include variable percentages of normal cells, adjustments to the depth of NGS data generated must be flexible to ensure that a sufficient representation of tumor-derived sequence reads are obtained. Isolating DNA from selected, tumor-rich areas of a biopsy or resection sample is ideal, but not always possible, so average read depths of 300- to 500-fold exome coverage are typically attempted to compensate for the normal cell DNA-derived reads. A second reason for high coverage of the tumor-derived DNA is to enable the evaluation of founder clone versus subclonal mutations in the resulting data. Here, we define founder mutations as the original set of mutations present in the cell that transformed from normal to neoplastic, whereas subclonal mutations occur as the daughter cells of this founder acquire additional mutations during growth of the tumor mass. Based on this definition, founder clone mutations in diploid regions of the exome have a proportional fraction of variant-containing sequencing reads (variant allele fraction or VAF) that is around 50% (adjusted for normal DNA contribution), since most somatic mutations are heterozygous. In theory, neoantigens that result from founder clone mutations should elicit a T cell response that targets all cancer cells rather than the subset of tumor cells that would be targeted by T cell response to subclonal neoantigens in the vaccine. Equally important to appropriate coverage depth for accurate prediction of variants is the algorithm or set of algorithms used to identify variants from the NGS exome data. The factors to consider here include the types of variants one wishes to evaluate in neoantigen discovery. For example, single nucleotide variants (point mutations) are easiest to predict with high accuracy because reads containing a single variant are readily aligned to their reference genome "match," and because there are a variety of different algorithms that also can detect low VAF variants. Variant detection from NGS reads has been an area of rapid development and there are many algorithms to choose from, with variable performance, as has been evaluated (Cornish and Guda, 2015Cornish A. Guda C. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference.BioMed Res. Int. 2015; 2015: 456479Crossref PubMed Scopus (10) Google Scholar, Ghoneim et al., 2014Ghoneim D.H. Myers J.R. Tuttle E. Paciorkowski A.R. Comparison of insertion/deletion calling algorithms on human next-generation sequencing data.BMC Res. Notes. 2014; 7: 864Crossref PubMed Google Scholar, Krøigård et al., 2016Krøigård A.B. Thomassen M. Lænkholm A.-V. Kruse T.A. Larsen M.J. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.PLoS ONE. 2016; 11: e0151664Crossref Google Scholar). By contrast, variants resulting from insertion or deletion of one or a few nucleotides ("indels") are significantly more difficult to identify due to issues of read alignment by standard alignment algorithms, that lead often to lower coverage in these regions for the variant-containing sequencing reads (Jiang et al., 2012Jiang Y. Wang Y. Brudno M. PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants.Bioinformatics. 2012; 28: 2576-2583Crossref PubMed Scopus (0) Google Scholar, Jiang et al., 2015Jiang Y. Turinsky A.L. Brudno M. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection.Nucleic Acids Res. 2015; 43: 7217-7228Crossref PubMed Scopus (8) Google Scholar, Ratan et al., 2015Ratan A. Olson T.L. Loughran Jr., T.P. Miller W. Identification of indels in next-generation sequencing data.BMC Bioinformatics. 2015; 16: 42Crossref PubMed Scopus (5) Google Scholar). However, indels may be important to immunogenomics efforts because they can introduce frameshift mutations that result in highly divergent amino acid sequences in the resulting protein and hence may produce strong predicted neoantigens. Increased read lengths on NGS platforms have improved indel detection, as has the use of gapped alignment or split-read algorithms that are computationally intensive but better able to align the indel-containing reads to the reference genome. Assembly-based realignment approaches also have been developed to improve the precision of indel variant detection (Mose et al., 2014Mose L.E. Wilkerson M.D. Hayes D.N. Perou C.M. Parker J.S. ABRA: improved coding indel detection via assembly-based realignment.Bioinformatics. 2014; 30: 2813-2815Crossref PubMed Scopus (27) Google Scholar, Narzisi et al., 2014Narzisi G. O'Rawe J.A. Iossifov I. Fang H. Lee Y.-H. Wang Z. Wu Y. Lyon G.J. Wigler M. Schatz M.C. Accurate de novo and transmitted indel detection in exome-capture data using microassembly.Nat. Methods. 2014; 11: 1033-1036Crossref PubMed Scopus (52) Google Scholar). Another type of somatic variation that can lead to highly altered amino acid sequences, and as a result create a neoantigenic peptide, is a structural variant which fuses two protein-coding sequences. These can result from inversion or deletion of a chromosomal segment or from chromosomal translocations. Detecting these alterations from exome sequencing data is quite challenging and error-prone, but RNA-based analysis can identify the resulting fusion transcript (Li et al., 2011Li Y. Chien J. Smith D.I. Ma J. FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq.Bioinformatics. 2011; 27: 1708-1710Crossref PubMed Scopus (0) Google Scholar, Scolnick et al., 2015Scolnick J.A. Dimon M. Wang I.-C. Huelga S.C. Amorese D.A. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples.PLoS ONE. 2015; 10: e0128916Crossref Google Scholar, Zhang et al., 2016aZhang J. White N.M. Schmidt H.K. Fulton R.S. Tomlinson C. Warren W.C. Wilson R.K. Maher C.A. INTEGRATE: gene fusion discovery using whole genome and transcriptome data.Genome Res. 2016; 26: 108-118Crossref PubMed Scopus (8) Google Scholar, Kumar et al., 2016Kumar S. Razzaq S.K. Vo A.D. Gautam M. Li H. Identifying fusion transcripts using next generation sequencing.Wiley Interdiscip. Rev. RNA. 2016; 7: 811-823Crossref PubMed Google Scholar) and compare the predicted fusion sequence to NGS data from DNA (whole genome or exome sequencing) to identify supporting evidence of the genomic event causing the fusion. Recently, we adapted this approach for neoantigen prediction with a process called IntegrateNEO, using the TMPRSS2-ERG fusions common in prostate cancer to evaluate its ability to identify fusion peptide neoantigens (Zhang et al., 2016bZhang J. Mardis E.R. Maher C.A. INTEGRATE-neo: a pipeline for personalized gene fusion neoantigen discovery.Bioinformatics. 2016; : btw674Crossref PubMed Google Scholar). RNaseq data bring added value to immunogenomics efforts beyond the detection of fusion peptides, as will be described later. Once variant detection is completed, each variant is annotated to predict the resulting amino acid change(s) that result from the altered DNA sequence (if any). There are widely utilized computational tools such as Annovar and VEP available to produce the translated peptides from the DNA data. The translated peptides constitute one type of input data for the neoantigen prediction software to calculate the class I or class II predicted binding affinities. The second data input for neoantigen prediction are the HLA haplotypes of the patient, also derived from exome data, since these reagents capture the HLA gene loci. Heretofore, HLA typing was performed using a PCR-based and Sanger sequencing-based clinical assay. The repetitive nature of the HLA genes requires a high-stringency assembly of these genes, which can be achieved using the >500 bp read lengths from Sanger data. Sequence analysis of these regions based on hybrid capture-derived NGS reads, which are relatively short (∼100 bp), requires a stringent alignment of the read data to the IMGT/HLA database (Robinson et al., 2001Robinson J. Waller M.J. Parham P. Bodmer J.G. Marsh S.G. IMGT/HLA Database--a sequence database for the human major histocompatibility complex.Nucleic Acids Res. 2001; 29: 210-213Crossref PubMed Google Scholar) using a haplotype-resolved algorithm to interpret the HLA class I and II haplotypes. There now exist multiple algorithms for accomplishing these data interpretations, including Polysolver (Shukla et al., 2015Shukla S.A. Rooney M.S. Rajasagi M. Tiao G. Dixon P.M. Lawrence M.S. Stevens J. Lane W.J. Dellagatta J.L. Steelman S. et al.Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes.Nat. Biotechnol. 2015; 33: 1152-1158Crossref PubMed Scopus (67) Google Scholar), HLAMiner (Warren et al., 2012Warren R.L. Choe G. Freeman D.J. Castellarin M. Munro S. Moore R. Holt R.A. Derivation of HLA types from shotgun sequence datasets.Genome Med. 2012; 4: 95Crossref PubMed Scopus (34) Google Scholar), and OptiType (Szolek et al., 2014Szolek A. Schubert B. Mohr C. Sturm M. Feldhahn M. Kohlbacher O. OptiType: precision HLA typing from next-generation sequencing data.Bioinformatics. 2014; 30: 3310-3316Crossref PubMed Scopus (45) Google Scholar). Typically, one interprets the normal tissue-derived exome data to obtain the HLA haplotypes. Clinical analysis of these genes also should include repeating the alignment of the tumor-derived exome data and identification of mutations in order to identify HLA alleles that are impacted by nonsense mutations, deletions, or other similarly deleterious types of somatic alterations that may influence the presence of that allele (Shukla et al., 2015Shukla S.A. Rooney M.S. Rajasagi M. Tiao G. Dixon P.M. Lawrence M.S. Stevens J. Lane W.J. Dellagatta J.L. Steelman S. et al.Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes.Nat. Biotechnol. 2015; 33: 1152-1158Crossref PubMed Scopus (67) Google Scholar). Some algorithms also can use RNA-derived data to interpret the HLA haplotypes (Warren et al., 2012Warren R.L. Choe G. Freeman D.J. Castellarin M. Munro S. Moore R. Holt R.A. Derivation of HLA types from shotgun sequence datasets.Genome Med. 2012; 4: 95Crossref PubMed Scopus (34) Google Scholar). Another critical component of identifying neoantigens is the in silico prediction of HLA class I and II binding affinities for specific peptides. These predictions are quite computationally complex and require machine learning-based approaches to establish models for the different types of binding site interactions. In particular, each peptide interacts with the binding pocket residues of the many different HLA proteins through the amino acid side chains of specific residues. Therefore, the binding affinity of any peptide is sequence-specific relative to that patient's HLA proteins, some of which may be common and some rare. There also are differences in the binding of peptides by class I or class II HLA that impact the precision of neoantigen prediction, as described later. Finally, there is considerable debate about an appropriate cutoff value for binding affinity in terms of what does or does not constitute a strong neoantigen candidate (Duan et al., 2014Duan F. Duitama J. Al Seesi S. Ayres C.M. Corcelli S.A. Pawashe A.P. Blanchard T. McMahon D. Sidney J. Sette A. et al.Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity.J. Exp. Med. 2014; 211: 2231-2248Crossref PubMed Scopus (78) Google Scholar)(Bassani-Sternberg et al., 2016Bassani-Sternberg M. Bräunlein E. Klar R. Engleitner T. Sinitcyn P. Audehm S. Straub M. Weber J. Slotta-Huspenina J. Specht K. et al.Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry.Nat. Commun. 2016; 7: 13404Crossref PubMed Scopus (293) Google Scholar). The initial approach to computational HLA binding predictions utilized a neural network-based learning method developed from a training set of experimentally derived binding affinities for class I HLA proteins and different peptides. This effort resulted in an HLA class I binding prediction software known as netMHC, devised by researchers in the Center for Biological Sequence Analysis at the Technical University of Denmark (Lundegaard et al., 2008aLundegaard C. Lund O. Nielsen M. Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers.Bioinformatics. 2008; 24: 1397-1398Crossref PubMed Scopus (83) Google Scholar, Lundegaard et al., 2008bLundegaard C. Lamberth K. Harndahl M. Buus S. Lund O. Nielsen M. NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11.Nucleic Acids Res. 2008; 36: W509-W512Crossref PubMed Scopus (295) Google Scholar, Nielsen et al., 2003Nielsen M. Lundegaard C. Worning P. Lauemøller S.L. Lamberth K. Buus S. Brunak S. Lund O. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.Protein Sci. 2003; 12: 1007-1017Crossref PubMed Scopus (391) Google Scholar). The predictor has improved over time with the availability of training datasets for HLA proteins that are more rare in the population, although calculated binding affinities for the most rare HLA alleles in humans remain less certain (Wang et al., 2010Wang P. Sidney J. Kim Y. Sette A. Lund O. Nielsen M. Peters B. Peptide binding predictions for HLA DR, DP and DQ molecules.BMC Bioinformatics. 2010; 11: 568Crossref PubMed Scopus (174) Google Scholar). An interim approach to address rare HLA class I binding calculations was PickPocket, which extrapolated from variants with known binding specificity to those without existing experimental data (Zhang et al., 2009Zhang H. Lund O. Nielsen M. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding.Bioinformatics. 2009; 25: 1293-1299Crossref PubMed Scopus (30) Google Scholar). The most recent version is netMHCstabpan (Rasmussen et al., 2016Rasmussen M. Fenoy E. Harndahl M. Kristensen A.B. Nielsen I.K. Nielsen M. Buus S. Pan-Specific Prediction of Peptide-MHC Class I Complex Stability, a Correlate of T Cell Immunogenicity.J. Immunol. 2016; 197: 1517-1524Crossref PubMed Google Scholar), which uses a neural network approach based on a dataset of stability values calculated for different peptide-MHC-1 complexes, rather than their binding affinity values, since the stability of their interaction has experimentally been shown to be more strongly correlated to T cell immunogenicity. Another early method developed to generate class I binding predictions was based on a stabilized matrix method (SMM) algorithm developed by Peters and Sette (Peters and Sette, 2005Peters B. Sette A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.BMC Bioinformatics. 2005; 6: 132Crossref PubMed Scopus (193) Google Scholar). This approach models the sequence specificity of binding processes as a means of predicting outcomes for untested sequences. SMM not only predicts HLA binding but also evaluates peptide transport as a function of antigen presentation and proteasomal cleavage with the TAP algorithm. Subsequent efforts to develop new class I binding affinity prediction software have included the use of combined support vector machine-based (SVM) and random forest machine-learning approaches (Srivastava et al., 2013Srivastava A. Ghosh S. Anantharaman N. Jayaraman V.K. Hybrid biogeography based simultaneous feature selection and MHC class I peptide binding prediction using support vector machines and random forests.J. Immunol. Methods. 2013; 387: 284-292Crossref PubMed Scopus (3) Google Scholar), or combined the information obtained from amino acid pairwise contact potentials and quantum topology molecular similarity descriptors (Saethang et al., 2013Saethang T. Hirose O. Kimkong I. Tran V.A. Dang X.T. Nguyen L.A.T. Le T.K.T. Kubo M. Yamada Y. Satou K. PAAQD: Predicting immunogenicity of MHC class I binding peptides using amino acid pairwise contact potentials and quantum topological molecular similarity descriptors.J. Immunol. Methods. 2013; 387: 293-302Crossref PubMed Scopus (6) Google Scholar) to better model HLA class I peptide interactions. With the requisite information generated by NGS to call somatic variants and interpret their impact on protein sequences, and to identify the HLA haplotypes specific to the patient, neoantigen prediction software can be used to predict both the class I and class II HLA binding affinities for each tumor-unique set of peptides. Considerations and specifics for these prediction approaches are described in detail below. There are a number of binding prediction software and associated immunogenomics algorithms available at the Immune Epitope DataBase (IEDB) analysis resource (http://tools.immuneepitope.org/main/) (Robinson et al., 2013Robinson J. Halliwell J.A. McWilliam H. Lopez R. Parham P. Marsh S.G.E. The IMGT/HLA database.Nucleic Acids Res. 2013; 41: D1222-D1227Crossref PubMed Scopus (419) Google Scholar). The IEDB web interface permits the input of peptide sequences for sequential evaluation by user-configured steps using the software of choice to predict neoantigens. Publicly available software pipelines also are available for local download and computing of neoantigen predictions by end-users, including pVAC-seq (https://github.com/griffithlab/pVAC-Seq) and epidisco (https://github.com/hammerlab). An overall workflow for the processes described above is shown in Figure 1. Approaches to predict HLA class I neoantigens typically begin by parsing the tumor-specific peptides predicted from variant calling as 21-mer peptides that encompass the variant amino acid(s) placed as near to the center of the 21-mer as possible. This is easiest to envisage for simple non-synonymous amino acid substitutions, shown in Figure 2A, which then are tiled across the variant-containing peptides to define a set of 8-mer to 11mers to input for binding calculations, based on HLA class I binding characteristics (Figure 2B). These peptide sets are parsed along with their corresponding wild-type peptide sequences as input data for consideration by neoantigen prediction software, along with information about the HLA class I haplotypes determined for the patient. The resulting list of neoantigens can be quite extensive, depending upon the num
Referência(s)