NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets

Morten Nielsen, Massimo Andreatta, Morten Nielsen, Massimo Andreatta

Abstract

Background: Binding of peptides to MHC class I molecules (MHC-I) is essential for antigen presentation to cytotoxic T-cells.

Results: Here, we demonstrate how a simple alignment step allowing insertions and deletions in a pan-specific MHC-I binding machine-learning model enables combining information across both multiple MHC molecules and peptide lengths. This pan-allele/pan-length algorithm significantly outperforms state-of-the-art methods, and captures differences in the length profile of binders to different MHC molecules leading to increased accuracy for ligand identification. Using this model, we demonstrate that percentile ranks in contrast to affinity-based thresholds are optimal for ligand identification due to uniform sampling of the MHC space.

Conclusions: We have developed a neural network-based machine-learning algorithm leveraging information across multiple receptor specificities and ligand length scales, and demonstrated how this approach significantly improves the accuracy for prediction of peptide binding and identification of MHC ligands. The method is available at www.cbs.dtu.dk/services/NetMHCpan-3.0 .

Figures

Fig. 1
Fig. 1
Predictive performance on different peptide lengths for the allmer and 9mer predictive methods. The two methods were trained as described in the text. The predictive performance was measured in terms of Pearson’s correlation coefficient (PCC) and area under the ROC curve (AUC), the latter using a binding threshold of 500 nM. The allmer method significantly outperforms the 9mer approach on peptides of all lengths from 8 to 10 (binomial test excluding ties). **: p 

Fig. 2

Predictive performance on different peptide…

Fig. 2

Predictive performance on different peptide lengths for the allmer and allmer-allele predictive methods.…

Fig. 2
Predictive performance on different peptide lengths for the allmer and allmer-allele predictive methods. The two methods were trained as described in the text. The predictive performance was measured in terms of Pearson’s correlation coefficient (PCC) and area under the ROC curve (AUC), the latter using a binding threshold of 500 nM. The allmer method significantly outperforms the allmer-allele approach for peptides of length 9 and 10 (binomial test excluding ties). **: p <0.001, *: p <0.05

Fig. 3

Length preference for the allmer…

Fig. 3

Length preference for the allmer and 9mer prediction methods compared to the length…

Fig. 3
Length preference for the allmer and 9mer prediction methods compared to the length preference in the SYFPEITHI data. Length profiles for the allmer and 9mer methods were estimated as described in the text. The SYFPEITHI length preference was estimated as the average over the allele-specific length preference of 24 MHC molecules characterized by 20 or more ligand data points

Fig. 4

Comparison of the predicted length…

Fig. 4

Comparison of the predicted length profile for alleles characterized by no or limited…

Fig. 4
Comparison of the predicted length profile for alleles characterized by no or limited peptide data of length different from nine amino acids. The distribution of predicted binders for the three alleles were characterized by relatively large data sets (>500 data points) with more than 99 % 9mers. Length profiles were estimated from the top 1 % of 1,000,000 random natural 8–11mer peptides using the allmer-allele (the method trained on allmer data in an allele-specific manner), and the allmer (the pan-specific method trained on allemer data) methods, respectively

Fig. 5

Rank analysis on the SYFPEITHI…

Fig. 5

Rank analysis on the SYFPEITHI ligand benchmark. Binding to the restriction element was…

Fig. 5
Rank analysis on the SYFPEITHI ligand benchmark. Binding to the restriction element was predicted for all 8–11mer peptides within the source proteins from the SYFPEITHI data set using the allmer and 9mer prediction methods, respectively. The percentage of identified ligands is plotted as a function of the percentage of top predicted binders from each source protein-ligand-MHC combination

Fig. 6

ROC curve analyses for the…

Fig. 6

ROC curve analyses for the SYFPEITHI benchmark dataset. Binding to the restriction element…

Fig. 6
ROC curve analyses for the SYFPEITHI benchmark dataset. Binding to the restriction element was predicted for all unique 8–11mer peptides within the source proteins from the SYFPEITHI benchmark using the allmer method. Binding values were reported as binding affinity and percentile rank values as described in the text. ROC curves were calculated for each prediction value taking ligands as positives and all other peptides as negatives. The inset plot shows the information divergence value (ID) as a function of the percentage of peptides selected. The ID was calculated from the proportion of peptides with predicted restriction to each of the MHC molecules in the benchmark compared to the proportion expected by sampling at random
Fig. 2
Fig. 2
Predictive performance on different peptide lengths for the allmer and allmer-allele predictive methods. The two methods were trained as described in the text. The predictive performance was measured in terms of Pearson’s correlation coefficient (PCC) and area under the ROC curve (AUC), the latter using a binding threshold of 500 nM. The allmer method significantly outperforms the allmer-allele approach for peptides of length 9 and 10 (binomial test excluding ties). **: p <0.001, *: p <0.05
Fig. 3
Fig. 3
Length preference for the allmer and 9mer prediction methods compared to the length preference in the SYFPEITHI data. Length profiles for the allmer and 9mer methods were estimated as described in the text. The SYFPEITHI length preference was estimated as the average over the allele-specific length preference of 24 MHC molecules characterized by 20 or more ligand data points
Fig. 4
Fig. 4
Comparison of the predicted length profile for alleles characterized by no or limited peptide data of length different from nine amino acids. The distribution of predicted binders for the three alleles were characterized by relatively large data sets (>500 data points) with more than 99 % 9mers. Length profiles were estimated from the top 1 % of 1,000,000 random natural 8–11mer peptides using the allmer-allele (the method trained on allmer data in an allele-specific manner), and the allmer (the pan-specific method trained on allemer data) methods, respectively
Fig. 5
Fig. 5
Rank analysis on the SYFPEITHI ligand benchmark. Binding to the restriction element was predicted for all 8–11mer peptides within the source proteins from the SYFPEITHI data set using the allmer and 9mer prediction methods, respectively. The percentage of identified ligands is plotted as a function of the percentage of top predicted binders from each source protein-ligand-MHC combination
Fig. 6
Fig. 6
ROC curve analyses for the SYFPEITHI benchmark dataset. Binding to the restriction element was predicted for all unique 8–11mer peptides within the source proteins from the SYFPEITHI benchmark using the allmer method. Binding values were reported as binding affinity and percentile rank values as described in the text. ROC curves were calculated for each prediction value taking ligands as positives and all other peptides as negatives. The inset plot shows the information divergence value (ID) as a function of the percentage of peptides selected. The ID was calculated from the proportion of peptides with predicted restriction to each of the MHC molecules in the benchmark compared to the proportion expected by sampling at random

References

    1. Yewdell JW, Bennink JR. Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. Annu Rev Immunol. 1999;17:51–88. doi: 10.1146/annurev.immunol.17.1.51.
    1. Hoof I, Peters B, Sidney J, Pedersen LE, Sette A, Lund O, et al. NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics. 2009;61(1):1–13. doi: 10.1007/s00251-008-0341-z.
    1. Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, et al. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE. 2007;2(8) doi: 10.1371/journal.pone.0000796.
    1. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 2015;43(Database issue):D423–D431. doi: 10.1093/nar/gku1161.
    1. Trolle T, Metushi IG, Greenbaum JA, Kim Y, Sidney J, Lund O, et al. Automated benchmarking of peptide-MHC class I binding predictions. Bioinformatics. 2015;31(13):2174–2181. doi: 10.1093/bioinformatics/btv123.
    1. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Nucleic Acids Res. 2008;36(Web Server issue):W509–W512. doi: 10.1093/nar/gkn202.
    1. Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 2003;12(5):1007–1017. doi: 10.1110/ps.0239403.
    1. Peters B, Sette A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinforma. 2005;6:132. doi: 10.1186/1471-2105-6-132.
    1. Kim Y, Sidney J, Pinilla C, Sette A, Peters B. Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior. BMC Bioinforma. 2009;10:394. doi: 10.1186/1471-2105-10-394.
    1. Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics. 2012;64(3):177–186. doi: 10.1007/s00251-011-0579-8.
    1. Kim Y, Sidney J, Buus S, Sette A, Nielsen M, Peters B. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC Bioinforma. 2014;15:241. doi: 10.1186/1471-2105-15-241.
    1. Lundegaard C, Lund O, Nielsen M. Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Bioinformatics. 2008;24(11):1397–1398. doi: 10.1093/bioinformatics/btn128.
    1. Deres K, Schumacher TN, Wiesmuller KH, Stevanovic S, Greiner G, Jung G, et al. Preferred size of peptides that bind to H-2 Kb is sequence dependent. Eur J Immunol. 1992;22(6):1603–1608. doi: 10.1002/eji.1830220638.
    1. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999;50:213–219. doi: 10.1007/s002510050595.
    1. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics. 2016;32(4):511–517. doi: 10.1093/bioinformatics/btv639.
    1. Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2015;43(Database issue):D405–D412. doi: 10.1093/nar/gku938.
    1. Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinforma. 2007;8:238. doi: 10.1186/1471-2105-8-238.
    1. Paul S, Weiskopf D, Angelo MA, Sidney J, Peters B, Sette A. HLA class I alleles are associated with peptide-binding repertoires of different size, affinity, and immunogenicity. J Immunol. 2013;191(12):5831–5839. doi: 10.4049/jimmunol.1302101.
    1. Rao X, Costa AI, van Baarle D, Kesmir C. A comparative study of HLA binding affinity and ligand diversity: implications for generating immunodominant CD8+ T cell responses. J Immunol. 2009;182(3):1526–1532. doi: 10.4049/jimmunol.182.3.1526.
    1. Erup Larsen M, Kloverpris H, Stryhn A, Koofhethile CK, Sims S, Ndung’u T, et al. HLArestrictor-a tool for patient-specific predictions of HLA restriction elements and optimal epitopes within peptides. Immunogenetics. 2011;63(1):43–55. doi: 10.1007/s00251-010-0493-5.
    1. Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22:76–86. doi: 10.1214/aoms/1177729694.
    1. Andreatta M, Schafer-Nielsen C, Lund O, Buus S, Nielsen M. NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS ONE. 2011;6(11):e26781. doi: 10.1371/journal.pone.0026781.
    1. Nielsen M, Lund O. NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinforma. 2009;10:296. doi: 10.1186/1471-2105-10-296.
    1. Karosiene E, Rasmussen M, Blicher T, Lund O, Buus S, Nielsen M. NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics. 2013;65(10):711–724. doi: 10.1007/s00251-013-0720-y.
    1. Nielsen M, Justesen S, Lund O, Lundegaard C, Buus S. NetMHCIIpan-2.0 - Improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure. Immunome Res. 2010;6:9. doi: 10.1186/1745-7580-6-9.
    1. Andreatta M, Karosiene E, Rasmussen M, Stryhn A, Buus S, Nielsen M. Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics. 2015;67(11-12):641–650. doi: 10.1007/s00251-015-0873-y.
    1. Collins EJ, Garboczi DN, Wiley DC. Three-dimensional structure of a peptide extending from one end of a class I MHC binding site. Nature. 1994;371(6498):626–629. doi: 10.1038/371626a0.
    1. Rist MJ, Theodossis A, Croft NP, Neller MA, Welland A, Chen Z, et al. HLA peptide length preferences control CD8+ T cell responses. J Immunol. 2013;191(2):561–571. doi: 10.4049/jimmunol.1300292.
    1. Geironson L, Thuring C, Harndahl M, Rasmussen M, Buus S, Roder G, et al. Tapasin facilitation of natural HLA-A and -B allomorphs is strongly influenced by peptide length, depends on stability, and separates closely related allomorphs. J Immunol. 2013;191(7):3939–3947. doi: 10.4049/jimmunol.1201741.
    1. Eichmann M, de Ru A, van Veelen PA, Peakman M, Kronenberg-Versteeg D. Identification and characterisation of peptide binding motifs of six autoimmune disease-associated human leukocyte antigen-class I molecules including HLA-B*39:06. Tissue antigens. 2014;84(4):378–388. doi: 10.1111/tan.12413.
    1. Ternette N, Yang H, Partridge T, Llano A, Cedeno S, Fischer R, et al. Defining the HLA class I-associated viral antigen repertoire from HIV-1-infected human cells. Eur J Immunol. 2016;46(1):60–69. doi: 10.1002/eji.201545890.

Source: PubMed

3
Abonnieren