A machine learning model for ranking candidate HLA class I neoantigens based on known neoepitopes from multiple human tumor types

Jared J Gartner, Maria R Parkhurst, Alena Gros, Eric Tran, Mohammad S Jafferji, Amy Copeland, Ken-Ichi Hanada, Nikolaos Zacharakis, Almin Lalani, Sri Krishna, Abraham Sachs, Todd D Prickett, Yong F Li, Maria Florentin, Scott Kivitz, Samuel C Chatmon, Steven A Rosenberg, Paul F Robbins, Jared J Gartner, Maria R Parkhurst, Alena Gros, Eric Tran, Mohammad S Jafferji, Amy Copeland, Ken-Ichi Hanada, Nikolaos Zacharakis, Almin Lalani, Sri Krishna, Abraham Sachs, Todd D Prickett, Yong F Li, Maria Florentin, Scott Kivitz, Samuel C Chatmon, Steven A Rosenberg, Paul F Robbins

Abstract

Tumor neoepitopes presented by major histocompatibility complex (MHC) class I are recognized by tumor-infiltrating lymphocytes (TIL) and are targeted by adoptive T-cell therapies. Identifying which mutant neoepitopes from tumor cells are capable of recognition by T cells can assist in the development of tumor-specific, cell-based therapies and can shed light on antitumor responses. Here, we generate a ranking algorithm for class I candidate neoepitopes by using next-generation sequencing data and a dataset of 185 neoepitopes that are recognized by HLA class I-restricted TIL from individuals with metastatic cancer. Random forest model analysis showed that the inclusion of multiple factors impacting epitope presentation and recognition increased output sensitivity and specificity compared to the use of predicted HLA binding alone. The ranking score output provides a set of class I candidate neoantigens that may serve as therapeutic targets and provides a tool to facilitate in vitro and in vivo studies aimed at the development of more effective immunotherapies.

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Extended Data Fig. 1 |. Percentile Rank…
Extended Data Fig. 1 |. Percentile Rank Comparisons between NetMHCpan4.0 EL and MHCFlurry1.6 Percentile Rank.
Percentile rank of positive mmps were mapped by their MHCflurry1.6 rank on the x-axis and the NetMHCpan4.0 EL model rank on the y-axis. Red Triangles correspond to mmps containing cysteine residues at positions 2,3 or C-terminus (n=12) while orange dots correspond to peptides containing cysteine residues at position 1 or between positions 3 and the C-terminus (n=107).
Extended Data Fig. 2 |. Nmer localization…
Extended Data Fig. 2 |. Nmer localization predictions.
WoLF Psort algorithm was used on all nmer proteins (n=9541) to predicted for localization. Blue bars are CD8 + Positive nmers, Orange bars are negative nmers. Y-axis represents frequency of each group predicted to localize. X axis are the WoLF Psort prediction abbreviations. chlo = chloroplast, cyto = cytosol, cysk = cytoskeleton, E.R. = endoplasmic reticulum, extr = extracellular, golg = Golgi apparatus, lyso = lysosome, mito = mitochondria, nucl = nuclear, pero = peroxisome, plas = plasma membrane, vacu = vacuolar membrane . Individual totals for each groups positive and negative can be found in Supplementary Table 12. Hyphenated values denote compound prediction. P-values comparing positive to negative nmers displayed over each prediction. P-values calculated using a two-sided Fisher’s exact test and corrected using Bonferroni correction for multiple comparisons.
Extended Data Fig. 3 |. Gene expression…
Extended Data Fig. 3 |. Gene expression Decile of mmps.
Gene expression deciles of positive (n=119) and negative mmps (n=2681162). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance calculated with Mann-Whitney U test.
Extended Data Fig. 4 |. IEDB Immunogenicity…
Extended Data Fig. 4 |. IEDB Immunogenicity scores of mmps.
IEDB Immunogenicity scores were generated for each mmp using the IEDB immunogenicity tool. The panels are split into all mmps (positive n=119, negative n=2681162), comparison of just those with a mutation anchor in position 2,3 or C-terminus (positive n=55, negative n= 1167363) and those without mutations in position 2,3, or C-terminus (positive n= 64, negative n= 1513799). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance was calculated using the Mann-Whitney U test.
Extended Data Fig. 5 |. Hydrophobicity scores…
Extended Data Fig. 5 |. Hydrophobicity scores of T-cell contact regions.
Hydrophobicity scores were calculated summing the Kyte-Doolittle hydrophobicity score of positions 4 through n-1. The panels are split into all mmps (positive n=119, negative n=2681162), comparison of just those with a anchor in position 2,3 or C-terminus (positive n=55, negative n= 1167363) and those without mutations in position 2,3, or C-terminus (positive n= 64, negative n= 1513799). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance calculated with Mann-Whitney U test.
Extended Data Fig. 6 |. Top NMER…
Extended Data Fig. 6 |. Top NMER models using either MMP score of MHCflurry Score as input.
ROC curve showing the mean performance of the top models using either MMP model scores or MHCflurry scores as input. Solid line represents mean for each model across n=5 folds, shaded area is the standard deviation at each point along the x-axis.
Fig. 1 |. Evaluating criteria for their…
Fig. 1 |. Evaluating criteria for their effects on nmer discovery.
a, Known CD8+ minimal epitopes predicted MHCflurry1.6 percentile rank for each HLA; HLA-A, n = 60; HLA-B, n = 47 and HLA-C, n = 12. The red line indicates the rank two cutoff. b, Histogram plotting the percentage of positive nmers found within the screened nmers from each expression decile. N = 1,240, 1,342, 1,371, 1,317, 1,081, 870, 539, 449, 317 and 1,015 nmers for deciles 10, 9, 8, 7, 6, 5, 4, 3, 2 and 1, respectively. c, Analysis of variant presence in corresponding RNA-seq data. The frequencies of reactive class I variants seen in nmers encoded by mutations that were detected in RNA-seq data, comprising 5,686 tested nmers and 124 class I–reactive nmers were compared with those that were not seen in the RNA-seq data, comprising a total of 3,855 tested nmers and 15 class I–reactive nmers, are shown.. Significance was calculated by a two-sided Fisher’s exact test. d, Box plot comparing the VAFs of reactive (n = 139) and non-reactive (n = 9,402) nmers. The VAFs were normalized and expressed as deciles to facilitate comparisons between samples. For box plots, the box indicates quartiles two and three and interquartile range (IQR), and the median is indicated by the line in the box plot. Whiskers represent quartiles 1 and 4 ± 1.5× IQR or minimum/maximum value if within the whisker values. Significance was calculated using the Mann–Whitney U-test.
Fig. 2 |. Analysis of unique minimal…
Fig. 2 |. Analysis of unique minimal epitopes.
Comparison of reactive (positive mmps, n = 119) and non-reactive (negative mmps, n = 2,681,162) peptide scores for MHCflurry1.6 percentile rank binding predictions (a), MHCflurry1.6 processing score (b) and MHCflurry1.6 presentation score (c). d, Contour plot showing CD8+ frequency for minimal epitopes at different thresholds of MHCflurry1.6 percentile rank binding and expression decile. e, NetMHCstabpan-1.0 stability score for peptide–HLA class I binding complexes. Proteasomal processing score for C-terminal residues using either the NetChop-3.1 C-term (f) or 20S (g) models. h, TAP transport score. i, Box plot of MHCflurry1.6 percentile binding rank for the wild-type peptide in relation to the snv mmp (wild-type:mutant) for reactive (positive mmps, n = 54) and non-reactive (negative mmps, n = 1,083,564) mmps with mutant anchors defined as positions two, three or the C terminus and reactive (positive mmps, n = 64) and non-reactive (negative mmps, n = 1,513,799) snv mmps without mutant anchors. For box plots, the box indicates quartiles two and three and IQR, and the median is indicated by the line. Whiskers represent quartiles 1 and 4 ± 1.5× IQR or minimum/maximum value if within the whisker values. Significance was calculated by a Mann–Whitney U-test. j, Scatter plot of MHCflurry1.6 percentile binding rank for mutant mmps compared to matched wild-type mmps for snv changes. Negative mmps (n = 2,582,291) are shown in blue, positive mmps with mutations outside of anchor residue (n = 64) are shown in red and positive mmps with mutant anchors (n = 54) are shown in orange.
Fig. 3 |. Evaluation of mmp input…
Fig. 3 |. Evaluation of mmp input features.
a, Spearman correlation matrix representing the associations between input features tested for the MMP model. b, The relative importance of features in the final MMP model trained with n = 120 positive mmps and n = 1,412,494 negative mmps. The relative importance represents the overall ability of a feature to purify nodes within the model.
Fig. 4 |. Evaluation of models in…
Fig. 4 |. Evaluation of models in a test set.
a, MMP model evaluation. Receiver operating characteristic (ROC) curve of the MMP model in a test set of n = 8,771 nmers, with n = 46 class I–reactive nmers ranked by their best-scoring mmp from five different prediction binding models. All nmers from the 26 test samples were scored and included. b, Individual rank of positive nmers (n = 46) found within an individual’s repertoire when sorted by the NMER model score or by the top NetMHCpan4.0 EL percentile binding-ranked mmp derived from each nmer. c, Percentile rank of positive nmers (n = 46) found within an individual’s repertoire when sorted by the NMER model score or by the top NetMHCpan4.0 EL percentile binding-ranked mmp derived from each nmer; P values were calculated with a two-sided Wilcoxon signed-rank test.

References

    1. Huang J et al. T cells associated with tumor regression recognize frameshifted products of the CDKN2A tumor suppressor gene locus and a mutated HLA class I gene product. J. Immunol 172, 6057–6064 (2004).
    1. Zhou J, Dudley ME, Rosenberg SA & Robbins PF Persistence of multiple tumor-specific T-cell clones is associated with complete tumor regression in a melanoma patient receiving adoptive cell transfer therapy. J. Immunother 28, 53–62 (2005).
    1. Robbins PF et al. Mining exomic sequencing data to identify mutated antigens recognized by adoptively transferred tumor-reactive T cells. Nat. Med 19, 747–752 (2013).
    1. Lu YC et al. Mutated PPP1R3B is recognized by T cells used to treat a melanoma patient who experienced a durable complete tumor regression. J. Immunol 190, 6034–6042 (2013).
    1. Lu YC et al. Efficient identification of mutated cancer antigens recognized by T cells associated with durable tumor regressions. Clin. Cancer Res 20, 3401–3410 (2014).
    1. Prickett TD et al. Durable complete response from metastatic melanoma after transfer of autologous T cells recognizing 10 mutated tumor antigens. Cancer Immunol. Res 4, 669–678 (2016).
    1. Tran E et al. Cancer immunotherapy based on mutation-specific CD4+ T cells in a patient with epithelial cancer. Science 344, 641–645 (2014).
    1. Tran E et al. T-cell transfer therapy targeting mutant KRAS in cancer. N. Engl. J. Med 375, 2255–2262 (2016).
    1. Zacharakis N et al. Immune recognition of somatic mutations leading to complete durable regression in metastatic breast cancer. Nat. Med 24, 724–730 (2018).
    1. Rizvi NA et al. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).
    1. McGranahan N et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).
    1. Hellmann MD et al. Genomic features of response to combination immunotherapy in patients with advanced non-small-cell lung cancer. Cancer Cell 33, 843–852 (2018).
    1. Le DT et al. Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science 357, 409–413 (2017).
    1. Le DT et al. PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med 372, 2509–2520 (2015).
    1. Peltomaki P DNA mismatch repair and cancer. Mutat. Res 488, 77–85 (2001).
    1. Peters B & Sette A Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinf. 6, 132 (2005).
    1. Alvarez B et al. NNAlign_MA; MHC peptidome deconvolution for accurate MHC binding motif characterization and improved T-cell epitope predictions. Mol. Cell. Proteomics 18, 2459–2477 (2019).
    1. O’Donnell TJ et al. MHCflurry: open-source class I MHC binding affinity prediction. Cell Syst. 7, 129–132 (2018).
    1. Duan F et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J. Exp. Med 211, 2231–2248 (2014).
    1. Bulik-Sullivan B et al. Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification. Nat. Biotechnol 37, 55–63 (2019).
    1. Hundal J et al. pVACtools: a computational toolkit to identify and visualize cancer neoantigens. Cancer Immunol. Res 8, 409–420 (2020).
    1. Bjerregaard AM, Nielsen M, Hadrup SR, Szallasi Z & Eklund AC MuPeXI: prediction of neo-epitopes from tumor sequencing data. Cancer Immunol. Immunother 66, 1123–1130 (2017).
    1. Kim S et al. Neopepsee: accurate genome-level prediction of neoantigens by harnessing sequence and amino acid immunogenicity information. Ann. Oncol 29, 1030–1036 (2018).
    1. Kosaloglu-Yalcin Z et al. Predicting T cell recognition of MHC class I restricted neoepitopes. Oncoimmunology 7, e1492508 (2018).
    1. Brown SD et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res. 24, 743–750 (2014).
    1. Balachandran VP et al. Identification of unique neoantigen qualities in long-term survivors of pancreatic cancer. Nature 551, 512–516 (2017).
    1. Parkhurst MR et al. Unique neoantigens arise from somatic mutations in patients with gastrointestinal cancers. Cancer Discov. 9, 1022–1035 (2019).
    1. Tran E et al. Immunogenicity of somatic mutations in human gastrointestinal cancers. Science 350, 1387–1390 (2015).
    1. Lo W et al. Immunologic recognition of a shared p53 mutated neoantigen in a patient with metastatic colorectal cancer. Cancer Immunol. Res 7, 534–543 (2019).
    1. Jurtz V et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol 199, 3360–3368 (2017).
    1. Gfeller D et al. The length distribution and multiple specificity of naturally presented HLA-I ligands. J. Immunol 201, 3705–3716 (2018).
    1. Sarkizova S et al. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat. Biotechnol 38, 199–209 (2020).
    1. Paul S et al. HLA class I alleles are associated with peptide-binding repertoires of different size, affinity, and immunogenicity. J. Immunol 191, 5831–5839 (2013).
    1. Chen W, Yewdell JW, Levine RL & Bennink JR Modification of cysteine residues in vitro and in vivo affects the immunogenicity and antigenicity of major histocompatibility complex class I-restricted viral determinants. J. Exp. Med 189, 1757–1764 (1999).
    1. Chen JL et al. Structural and kinetic basis for heightened immunogenicity of T cell vaccines. J. Exp. Med 201, 1243–1255 (2005).
    1. Sachs A, et al. Impact of cysteine residues on MHC binding predictions and recognition by tumor-reactive T cells. J. Immunol 205, 539–549 (2020).
    1. Sahin U et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).
    1. Horton P et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007).
    1. Abelin JG et al. Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. Immunity 46, 315–326 (2017).
    1. Rasmussen M et al. Pan-specific prediction of peptide–MHC class I complex stability, a correlate of T cell immunogenicity. J. Immunol 197, 1517–1524 (2016).
    1. Jorgensen KW, Rasmussen M, Buus S & Nielsen M NetMHCstab—predicting stability of peptide–MHC-I complexes; impacts for cytotoxic T lymphocyte epitope discovery. Immunology 141, 18–26 (2014).
    1. Groettrup M, Kirk CJ & Basler M Proteasomes in immune cells: more than peptide producers? Nat. Rev. Immunol 10, 73–78 (2010).
    1. Larsen MV et al. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. Eur. J. Immunol 35, 2295–2303 (2005).
    1. Capietto AH et al. Mutation position is an important determinant for predicting cancer neoantigens. J. Exp. Med 217, e20190179 (2020).
    1. Calis JJ et al. Properties of MHC class I presented peptides that enhance immunogenicity. PLoS Comput. Biol 9, e1003266 (2013).
    1. Chowell D et al. TCR contact residue hydrophobicity is a hallmark of immunogenic CD8+ T cell epitopes. Proc. Natl Acad. Sci. USA 112, E1754–E1762 (2015).
    1. Cohen CJ et al. Isolation of neoantigen-specific T cells from tumor and peripheral lymphocytes. J. Clin. Invest 125, 3981–3991 (2015).
    1. Gros A et al. PD-1 identifies the patient-specific CD8+ tumor-reactive repertoire infiltrating human tumors. J. Clin. Invest 124, 2246–2259 (2014).
    1. Gros A et al. Prospective identification of neoantigen-specific lymphocytes in the peripheral blood of melanoma patients. Nat. Med 22, 433–438 (2016).
    1. Parkhurst M et al. Isolation of T-cell receptors specifically reactive with mutated tumor-associated antigens from tumor-infiltrating lymphocytes based on CD137 expression. Clin. Cancer Res 23, 2491–2505 (2017).
    1. Stevanovic S et al. Landscape of immunogenic tumor antigens in successful immunotherapy of virally induced epithelial cancer. Science 356, 200–205 (2017).
    1. Deniger DC et al. T-cell responses to TP53 “Hotspot” mutations and unique neoantigens expressed by human ovarian cancers. Clin. Cancer Res 24, 5562–5573 (2018).
    1. Yossef R et al. Enhanced detection of neoantigen-reactive T cells targeting unique and shared oncogenes for personalized cancer immunotherapy. JCI Insight 3, e122467 (2018).
    1. Gros A et al. Recognition of human gastrointestinal cancer neoantigens by circulating PD-1+ lymphocytes. J. Clin. Invest 129, 4992–5004 (2019).
    1. Larsen MV et al. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinf. 8, 424 (2007).
    1. Gartner J Datasets for ‘Development of a model for ranking candidate HLA class I neoantigens based upon datasets of known neoepitopes’. figshare 10.35092/yhjc.c.4792338.v2 (2020).

Source: PubMed

3
Abonnere