Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis

Sijia Huang, Nicole Chong, Nathan E Lewis, Wei Jia, Guoxiang Xie, Lana X Garmire, Sijia Huang, Nicole Chong, Nathan E Lewis, Wei Jia, Guoxiang Xie, Lana X Garmire

Abstract

Background: More accurate diagnostic methods are pressingly needed to diagnose breast cancer, the most common malignant cancer in women worldwide. Blood-based metabolomics is a promising diagnostic method for breast cancer. However, many metabolic biomarkers are difficult to replicate among studies.

Methods: We propose that higher-order functional representation of metabolomics data, such as pathway-based metabolomic features, can be used as robust biomarkers for breast cancer. Towards this, we have developed a new computational method that uses personalized pathway dysregulation scores for disease diagnosis. We applied this method to predict breast cancer occurrence, in combination with correlation feature selection (CFS) and classification methods.

Results: The resulting all-stage and early-stage diagnosis models are highly accurate in two sets of testing blood samples, with average AUCs (Area Under the Curve, a receiver operating characteristic curve) of 0.968 and 0.934, sensitivities of 0.946 and 0.954, and specificities of 0.934 and 0.918. These two metabolomics-based pathway models are further validated by RNA-Seq-based TCGA (The Cancer Genome Atlas) breast cancer data, with AUCs of 0.995 and 0.993. Moreover, important metabolic pathways, such as taurine and hypotaurine metabolism and the alanine, aspartate, and glutamate pathway, are revealed as critical biological pathways for early diagnosis of breast cancer.

Conclusions: We have successfully developed a new type of pathway-based model to study metabolomics data for disease diagnosis. Applying this method to blood-based breast cancer metabolomics data, we have discovered crucial metabolic pathway signatures for breast cancer diagnosis, especially early diagnosis. Further, this modeling approach may be generalized to other omics data types for disease diagnosis.

Figures

Fig. 1
Fig. 1
The workflow of the pathway-based metabolomics data analysis. Step 1: conversion from metabolite- to pathway-based metabolomics data. The input data include the master file containing pathway-metabolite mapping information, the metabolomics profiling data, and the normal/tumor classification vector. The metabolomics-level data are transformed to pathway-level data by the pathifier algorithm. The output file of pathifier is the pathway dysregulation score matrix, within which each score measures the deregulation of a specific pathway for a specific sample. Step 2: model construction. Qualified COH plasma samples are split by 80/20 for training and holdout testing data. Correlation feature selection (CFS) is used for feature selection and the logistic regression model is used for classification. Tenfold cross-validation (10-fold CV) is applied with CFS feature selection in the plasma training data set. Two models are constructed: an all-stage diagnostic model and an early-stage diagnostic model. Step 3: model evaluation. The model performance is assessed using receiver operating characteristic (ROC) curves and various metrics, including AUC, MCC, sensitivity, specificity, and F1-statistic
Fig. 2
Fig. 2
The performance of the all-stage diagnosis model for breast cancer. We used 80 % of the controls and cases in the COH plasma data set to train the model. The remaining COH plasma data (20 %) and the COH serum data set were used as the test set and validation set. a Receiver operating characteristic (ROC) curves for the all-stage breast cancer diagnosis from different data sets. b AUC, MCC, sensitivity, specificity, and F1-statistic to measure the performance of the all-stage diagnosis model. c Mutual information for pathway features selected by the all-stage diagnosis model. d. Log fold change of metabolites associated with the selected pathway features determined by comparing cases and controls across different data sets
Fig. 3
Fig. 3
The performance of the early-stage diagnosis model for breast cancer. We used 80 % of the controls and early-stage (stage I and II) cases in the COH plasma data set to train the model. The remaining controls and early stage cases in the COH plasma data set, as well as controls and early stage cases in the COH serum data set, were used as the testing and validation set. a Receiver operating characteristic (ROC) curves for the early-stage breast cancer diagnosis from different data sets. b AUC, MCC, sensitivity, specificity, and F1-statistic to measure the performance of the early-stage diagnosis model. c Mutual information for pathway features selected by the all-stage diagnosis model. d Log fold change of metabolites associated with the selected pathway features determined by comparing cases and controls across different data sets
Fig. 4
Fig. 4
Integrative analysis of pathway features and the associated metabolites. The key pathways and their intersections crucial for breast cancer diagnosis. Metabolites and enzymes are represented with nodes of different shapes and colors, and their relationships are represented by edges
Fig. 5
Fig. 5
Receiver operating characteristic (ROC) curves comparison of pathway-based model and metabolites-based model among data sets. The same 80 % of early stage (stage I and II) cases and controls from the COH plasma data set used in the early-stage diagnosis model were used for the plasma training set. The remaining 20 % of early stage (stage I and II) cases and controls represent the test set. The metabolite-based model is based on the same tenfold cross-validation CFS selection used for the plasma training set. ROC curves for training and test sets are compared between the plasma-based model and the metabolite-based model among data sets

References

    1. American Cancer Society. Cancer facts & figures 2015. .
    1. Singletary SE, Allred C, Ashley P, Bassett LW, Berry D, et al. Revision of the American Joint Committee on Cancer staging system for breast cancer. J Clin Oncol. 2002;20:3628–36. doi: 10.1200/JCO.2002.02.026.
    1. Guth U, Huang DJ, Huber M, Schotzau A, Wruk D, et al. Tumor size and detection in breast cancer: Self-examination and clinical breast examination are at their limit. Cancer Detect Prev. 2008;32:224–8. doi: 10.1016/j.cdp.2008.04.002.
    1. Fiehn O. Metabolomics--the link between genotypes and phenotypes. Plant Mol Biol. 2002;48:155–71. doi: 10.1023/A:1013713905833.
    1. Blasco H, Nadal-Desbarats L, Pradat PF, Gordon PH, Antar C, et al. Untargeted 1H-NMR metabolomics in CSF: toward a diagnostic biomarker for motor neuron disease. Neurology. 2014;82:1167–74. doi: 10.1212/WNL.0000000000000274.
    1. Fan Y, Murphy TB, Byrne JC, Brennan L, Fitzpatrick JM, et al. Applying random forests to identify biomarker panels in serum 2D-DIGE data for the detection and staging of prostate cancer. J Proteome Res. 2011;10:1361–73. doi: 10.1021/pr1011069.
    1. Garcia E, Andrews C, Hua J, Kim HL, Sukumaran DK, et al. Diagnosis of early stage ovarian cancer by 1H NMR metabonomics of serum explored by use of a microflow NMR probe. J Proteome Res. 2011;10:1765–71. doi: 10.1021/pr101050d.
    1. Qiu Y, Cai G, Su M, Chen T, Liu Y, et al. Urinary metabonomic study on colorectal cancer. J Proteome Res. 2010;9:1627–34. doi: 10.1021/pr901081y.
    1. Cai Z, Zhao JS, Li JJ, Peng DN, Wang XY, et al. A combined proteomics and metabolomics profiling of gastric cardia cancer reveals characteristic dysregulations in glucose metabolism. Mol Cell Proteomics. 2010;9:2617–28. doi: 10.1074/mcp.M110.000661.
    1. Wei J, Xie G, Zhou Z, Shi P, Qiu Y, et al. Salivary metabolite signatures of oral cancer and leukoplakia. Int J Cancer. 2011;129:2207–17. doi: 10.1002/ijc.25881.
    1. Pasikanti KK, Esuvaranathan K, Ho PC, Mahendran R, Kamaraj R, et al. Noninvasive urinary metabonomic diagnosis of human bladder cancer. J Proteome Res. 2010;9:2988–95. doi: 10.1021/pr901173v.
    1. Budczies J, Pfitzner BM, Gyorffy B, Winzer KJ, Radke C, et al. Glutamate enrichment as new diagnostic opportunity in breast cancer. Int J Cancer. 2015;136(7):1619–28.
    1. Tenori L, Oakman C, Morris PG, Gralka E, Turner N, et al. Serum metabolomic profiles evaluated after surgery may identify patients with oestrogen receptor negative early breast cancer at increased risk of disease recurrence. Results from a retrospective study. Mol Oncol. 2015;9:128–39. doi: 10.1016/j.molonc.2014.07.012.
    1. Zhang F, Du G. Dysregulated lipid metabolism in cancer. World J Biol Chem. 2012;3:167–74. doi: 10.4331/wjbc.v3.i8.167.
    1. Hastie T, Stuetzle W. Principal curves. J Am Stat Assoc. 1989;84:502–16. doi: 10.1080/01621459.1989.10478797.
    1. Cancer Genome Atlas N Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. doi: 10.1038/nature11412.
    1. Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, et al. HMDB 3.0--The Human Metabolome Database in 2013. Nucleic Acids Res. 2013;41:D801–7. doi: 10.1093/nar/gks1065.
    1. Jewison T, Su Y, Disfany FM, Liang Y, Knox C, et al. SMPDB 2.0: big improvements to the Small Molecule Pathway Database. Nucleic Acids Res. 2014;42:D478–484. doi: 10.1093/nar/gkt1067.
    1. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27.
    1. Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, et al. A community-driven global reconstruction of human metabolism. Nat Biotechnol. 2013;31:419–25. doi: 10.1038/nbt.2488.
    1. Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–41. doi: 10.1016/S1574-1400(08)00012-1.
    1. Drier Y, Sheffer M, Domany E. Pathway-based personalized analysis of cancer. Proc Natl Acad Sci U S A. 2013;110:6388–93. doi: 10.1073/pnas.1219651110.
    1. Huang S, Yee C, Ching T, Yu H, Garmire LX. A novel model to combine clinical and pathway-based transcriptomic information for the prognosis prediction of breast cancer. PLoS Comput Biol. 2014;10:e1003851. doi: 10.1371/journal.pcbi.1003851.
    1. Hornik K, Buchta C, Zeileis A. Open-source machine learning: R meets Weka. Comput Stat. 2009;24:225–32. doi: 10.1007/s00180-008-0119-7.
    1. Hall MA. Correlation-based feature selection for machine learning. The University of Waikato; 1999
    1. Xia J, Sinelnikov IV, Han B, Wishart DS. MetaboAnalyst 3.0--making metabolomics more meaningful. Nucleic Acids Res. 2015;43:W251–7. doi: 10.1093/nar/gkv380.
    1. van Iterson M, van de Wiel MA, Boer JM, de Menezes RX. General power and sample size calculations for high-dimensional genomic data. Stat Appl Genet Mol Biol. 2013;12:449–67.
    1. Gossai D, Lau-Cam CA. The effects of taurine, taurine homologs and hypotaurine on cell and membrane antioxidative system alterations caused by type 2 diabetes in rat erythrocytes. Adv Exp Med Biol. 2009;643:359–68. doi: 10.1007/978-0-387-75681-3_37.
    1. Brand A, Leibfritz D, Hamprecht B, Dringen R. Metabolism of cysteine in astroglial cells: synthesis of hypotaurine and taurine. J Neurochem. 1998;71:827–32. doi: 10.1046/j.1471-4159.1998.71020827.x.
    1. Pradhan MP, Desai A, Palakal MJ. Systems biology approach to stage-wise characterization of epigenetic genes in lung adenocarcinoma. BMC Syst Biol. 2013;7:141. doi: 10.1186/1752-0509-7-141.
    1. Fong MY, McDunn J, Kakar SS. Identification of metabolites in the normal ovary and their transformation in primary and metastatic ovarian cancer. PLoS One. 2011;6:e19963. doi: 10.1371/journal.pone.0019963.
    1. Roy D, Mondal S, Wang C, He X, Khurana A, et al. Loss of HSulf-1 promotes altered lipid metabolism in ovarian cancer. Cancer Metab. 2014;2:13. doi: 10.1186/2049-3002-2-13.
    1. Tiruppathi C, Brandsch M, Miyamoto Y, Ganapathy V, Leibach FH. Constitutive expression of the taurine transporter in a human colon carcinoma cell line. Am J Physiol. 1992;263:G625–31.
    1. Shen J, Yan L, Liu S, Ambrosone CB, Zhao H. Plasma metabolomic profiles in breast cancer patients and healthy controls: by race and tumor receptor subtypes. Transl Oncol. 2013;6:757–65. doi: 10.1593/tlo.13619.
    1. Miyagi Y, Higashiyama M, Gochi A, Akaike M, Ishikawa T, et al. Plasma free amino acid profiling of five types of cancer patients and its application for early detection. PLoS One. 2011;6:e24143. doi: 10.1371/journal.pone.0024143.
    1. Rodriguez CI, Setaluri V. Cyclic AMP (cAMP) signaling in melanocytes and melanoma. Arch Biochem Biophys. 2014;563:22–7. doi: 10.1016/j.abb.2014.07.003.
    1. Desman G, Waintraub C, Zippin JH. Investigation of cAMP microdomains as a path to novel cancer diagnostics. Biochim Biophys Acta. 2014;1842:2636–45. doi: 10.1016/j.bbadis.2014.08.016.
    1. Marshall KC. The role of beta-alanine in the biosynthesis of nitrate by Aspergillus flavus. Anton Leeuw. 1965;31:386–94. doi: 10.1007/BF02045918.
    1. Dang CV. Glutaminolysis: supplying carbon or nitrogen or both for cancer cells? Cell Cycle. 2010;9:3884–6. doi: 10.4161/cc.9.19.13302.
    1. Katz R. Biomarkers and surrogate markers: an FDA perspective. NeuroRx. 2004;1:189–95. doi: 10.1602/neurorx.1.2.189.
    1. Denkert C, Bucher E, Hilvo M, Salek R, Oresic M, et al. Metabolomics of human breast cancer: new approaches for tumor typing and biomarker discovery. Genome Med. 2012;4:37.
    1. Nam H, Chung BC, Kim Y, Lee K, Lee D. Combining tissue transcriptomics and urine metabolomics for breast cancer biomarker identification. Bioinformatics. 2009;25:3151–7. doi: 10.1093/bioinformatics/btp558.
    1. Borgan E, Sitter B, Lingjaerde OC, Johnsen H, Lundgren S, et al. Merging transcriptomics and metabolomics--advances in breast cancer profiling. BMC Cancer. 2010;10:628. doi: 10.1186/1471-2407-10-628.
    1. Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Bayesian independent component analysis recovers pathway signatures from blood metabolomics data. J Proteome Res. 2012;11:4120–31. doi: 10.1021/pr300231n.
    1. Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst Biol. 2011;5:21. doi: 10.1186/1752-0509-5-21.
    1. Jobard E, Pontoizeau C, Blaise BJ, Bachelot T, Elena-Herrmann B, et al. A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Lett. 2014;343:33–41. doi: 10.1016/j.canlet.2013.09.011.
    1. Poschke I, Mao Y, Kiessling R, de Boniface J. Tumor-dependent increase of serum amino acid levels in breast cancer patients has diagnostic potential and correlates with molecular tumor subtypes. J Transl Med. 2013;11:290. doi: 10.1186/1479-5876-11-290.
    1. Oakman C, Tenori L, Claudino WM, Cappadona S, Nepi S, et al. Identification of a serum-detectable metabolomic fingerprint potentially correlated with the presence of micrometastatic disease in early breast cancer patients at varying risks of disease relapse by traditional prognostic methods. Ann Oncol. 2011;22:1295–301. doi: 10.1093/annonc/mdq606.
    1. de Leoz ML, Young LJ, An HJ, Kronewitter SR, Kim J, et al. High-mannose glycans are elevated during breast cancer progression. Mol Cell Proteomics. 2011;10(M110):002717.
    1. Asiago VM, Alvarado LZ, Shanaiah N, Gowda GA, Owusu-Sarfo K, et al. Early detection of recurrent breast cancer using metabolite profiling. Cancer Res. 2010;70:8309–18. doi: 10.1158/0008-5472.CAN-10-1319.
    1. Yang C, Richardson AD, Smith JW, Osterman A. Comparative metabolomics of breast cancer. Pac Symp Biocomput. 2007;181–92. .
    1. Miller JA, Pappan K, Thompson PA, Want EJ, Siskos AP, et al. Plasma metabolomic profiles of breast cancer patients after short-term limonene intervention. Cancer Prev Res (Phila) 2015;8:86–93. doi: 10.1158/1940-6207.CAPR-14-0100.
    1. Xia J, Wishart DS. MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic Acids Res. 2010;38:W71–7. doi: 10.1093/nar/gkq329.
    1. Bucak MN, Tuncer PB, Sariozkan S, Ulutas PA, Coyan K, et al. Effects of hypotaurine, cysteamine and aminoacids solution on post-thaw microscopic and oxidative stress parameters of Angora goat semen. Res Vet Sci. 2009;87:468–72. doi: 10.1016/j.rvsc.2009.04.014.
    1. Yang W, Yoshigoe K, Qin X, Liu JS, Yang JY, et al. Identification of genes and pathways involved in kidney renal clear cell carcinoma. BMC Bioinformatics. 2014;15(17):S2. doi: 10.1186/1471-2105-15-S17-S2.
    1. Jaraj SJ, Augsten M, Häggarth L, Wester K, Pontén F, et al. GAD1 is a biomarker for benign and malignant prostatic tissue. Scand J Urol Nephrol. 2011;45:39–45. doi: 10.3109/00365599.2010.521189.
    1. Xie G, Zhou B, Zhao A, Qiu Y, Zhao X, et al. Lowered circulating aspartate is a metabolic feature of human breast cancer. Oncotarget. 2015;6:33369–81.
    1. Cui H, Darmanin S, Natsuisaka M, Kondo T, Asaka M, et al. Enhanced expression of asparagine synthetase under glucose-deprived conditions protects pancreatic cancer cells from apoptosis induced by glucose deprivation and cisplatin. Cancer Res. 2007;67:3345–55. doi: 10.1158/0008-5472.CAN-06-2519.
    1. Berg M, Vanaerschot M, Jankevics A, Cuypers B, Breitling R, et al. LC-MS metabolomics from study design to data-analysis - using a versatile pathogen as a test case. Comput Struct Biotechnol J. 2013;4:e201301002. doi: 10.5936/csbj.201301002.
    1. Johnson SR, Lange BM. Open-access metabolomics databases for natural product research: present capabilities and future potential. Front Bioeng Biotechnol. 2015;3:22. doi: 10.3389/fbioe.2015.00022.
    1. Yizhak K, Gaude E, Le Devedec S, Waldman YY, Stein GY, et al. Phenotype-based cell-specific metabolic modeling reveals metabolic liabilities of cancer. Elife 3. 2015;136(7):1619–28.
    1. Lewis NE, Abdel-Haleem AM. The evolution of genome-scale models of cancer metabolism. Front Physiol. 2013;4:237.

Source: PubMed

3
Tilaa