Estimating relative abundances of proteins from shotgun proteomics data

Sean McIlwain, Michael Mathews, Michael S Bereman, Edwin W Rubel, Michael J MacCoss, William Stafford Noble, Sean McIlwain, Michael Mathews, Michael S Bereman, Edwin W Rubel, Michael J MacCoss, William Stafford Noble

Abstract

Background: Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using shotgun proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SI(N)), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF).

Results: We compared the reproducibility and the linearity relative to each protein's abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SI(N) and NSAF achieve the best linearity.

Conclusions: With the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at http://noble.gs.washington.edu/proj/crux-spectral-counts.

Figures

Figure 1
Figure 1
Reproducibility of spectral counts across biological and technical replicate experiments. Each plot compares either the SIN, emPAI, NSAF or dNSAF measure for proteins that were reproducibly identified across two replicate experiments. For visualization purposes, the counts are plotted on a logarithmic scale. The number in the lower right corner of each panel is the corresponding Spearman correlation and the number in the upper left is the number of datapoints compared.
Figure 2
Figure 2
Comparison of spectral counts across replicates. This graph summarizes the statistical analysis of the reproducibility measurements. An edge leading out from node A to node B indicates a statistically significant improvement in reproducibility for method A relative to method B.
Figure 3
Figure 3
Comparison of spectral counts across UPS1 dilution curve. This graph summarizes the statistical analysis of the linearity measurements. Two types of analysis were performed, using the linear regression correlation, R2 and mean percent error (MPE) for the C. elegans + UPS1 dilution curve dataset. An edge leading out from node A to node B indicates a statistically significant improvement in linearity for method A relative to method B.

References

    1. Wang M, You J, Bemis KG, Tegeler TJ, Brown DP. Label-free mass spectrometry-based protein quantification technologies in proteomic analysis. Brief Funct Genomic Proteomic. 2008;7(5):329–339. doi: 10.1093/bfgp/eln031.
    1. Searle BC, Tabb DL, Falkner JA, Kowalak JA, Meyer-Arendt K, Rudnick PA, Seymour SL, Lane WS. iPRG2009 study: testing for differences between complex samples in proteomics datasets. Poster at ABRF2009. 2009;28(1):83–89.
    1. Griffin NM, Yu J, Long F, Oh P, Shore S, Li Y, Koziol JA, Schnitzer JE. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. Nat Biotechnol. 2010;28:83–89. doi: 10.1038/nbt.1592.
    1. Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M. Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein. Mol Cell Proteomics. 2005;4(9):1265–1272. doi: 10.1074/mcp.M500061-MCP200.
    1. Paoletti AC, Parmely TJ, Tomomori-Sato C, Sato S, Zhu D, Conaway RC, Conaway JW, Florens L, Washburn MP. Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proc Nat Acad Sci USA. 2006;103(50):18928–18933. doi: 10.1073/pnas.0606379103.
    1. Zhang Y, Wen Z, Washburn MP, Florens L. Refinements to Label Free Proteome Quantitation: How to Deal with Peptides Shared by Multiple Proteins. Anal Chem. 2010;82(6):2272–2281. doi: 10.1021/ac9023999.
    1. Keller A, Eng J, Zhang N, Li X, Aebersold R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol. 2005;1:2005.0017.
    1. Neilson KA, Ali NA, Muralidharan S, Mirzaei M, Mariani M, Assadourian G, Lee A, van Sluyter SC, Haynes PA. Less label, more free: Approaches in label-free quantitative mass spectrometry. Proteomics. 2011;11(4):535–553. doi: 10.1002/pmic.201000553.
    1. Searle BC. Scaffold: A bioinformatic tool for validating MS/MS-based proteomic studies. Proteomics. 2010;10(6):1265–1269. doi: 10.1002/pmic.200900437.
    1. Braisted J, Kuntumalla S, Vogel C, Marcotte E, Rodrigues A, Wang R, Huang ST, Ferlanti E, Saeed A, Fleischmann R, Peterson S, Pieper R. The APEX Quantitative Proteomics Tool: Generating protein quantitation estimates from LC-MS/MS proteomics results. BMC Bioinformatics. 2008;9:529. doi: 10.1186/1471-2105-9-529.
    1. Shinoda K, Tomita M, Ishihama Y. emPAI Calc-for the estimation of protein abundance from large-scale identification data by liquid chromatography-tandem mass spectrometry. Bioinformatics. 2010;26(4):576–577. doi: 10.1093/bioinformatics/btp700.
    1. Heinecke NL, Pratt BS, Vaisar T, Becker L. PepC: proteomics software for identifying differentially expressed proteins based on spectral counting. Bioinformatics. 2010;26(12):1574–1575. doi: 10.1093/bioinformatics/btq171.
    1. Park CY, Klammer AA, Käll L, MacCoss MP, Noble WS. Rapid and accurate peptide identification from tandem mass spectra. J Proteome Res. 2008;7(7):3022–3027. doi: 10.1021/pr800127y.
    1. Käll L, Storey JD, MacCoss MJ, Noble WS. Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res. 2008;7:29–34. doi: 10.1021/pr700600n.
    1. Spivak M, Weston J, Tomazela D, MacCoss MJ, Noble WS. Direct maximization of protein identifications from tandem mass spectra. Mol Cell Proteomics. 2012;11(2):M111.012161. doi: 10.1074/mcp.M111.012161. [PMC3277760]
    1. Klammer AA, Park CY, Noble WS. Statistical calibration of the Sequest XCorr function. J Proteome Res. 2009;8(4):2106–2113. doi: 10.1021/pr8011107.
    1. Hsieh E, Hoopmann M, Maclean B, Maccoss M. Comparison of database search strategies for high precursor mass accuracy MS/MS data. J Proteome Res. 2009.
    1. McIlwain S, Draghicescu P, Singh P, Goodlett DR, Noble WS. Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs. J Proteome Res. 2010;9(5):2488–2495. doi: 10.1021/pr901163d. [PMC20349954]
    1. Jones AR, Eisenacher M, Mayer G, Kohlbacher O, Siepen J, Hubbard SJ, Selley JN, Searle BC, Shofstahl J, Seymour SL, Julian R, Binz PA, Deutsch EW, Hermjakob H, Reisinger F, Griss J, Vizcaíno JA, Chambers M, Pizarro A, Creasy D. The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics. 2012;11(7):M111.014381. doi: 10.1074/mcp.M111.014381.
    1. Zhang B, Chambers MC, Tabb DL. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J Proteome Res. 2007;6(9):3549–3557. doi: 10.1021/pr070230d.
    1. Colaert N, Vandekerckhove J, Gavaert K, Martens L. A comparison of MS2-based label-free quantitative proteomic techniques with regards to accuracy and precision. Proteomics. 2011;11(6):1110–1113. doi: 10.1002/pmic.201000521.

Source: PubMed

3
구독하다