Comparison of unsupervised machine-learning methods to identify metabolomic signatures in patients with localized breast cancer

Jocelyn Gal, Caroline Bailleux, David Chardin, Thierry Pourcher, Julia Gilhodes, Lun Jing, Jean-Marie Guigonis, Jean-Marc Ferrero, Gerard Milano, Baharia Mograbi, Patrick Brest, Yann Chateau, Olivier Humbert, Emmanuel Chamorey, Jocelyn Gal, Caroline Bailleux, David Chardin, Thierry Pourcher, Julia Gilhodes, Lun Jing, Jean-Marie Guigonis, Jean-Marc Ferrero, Gerard Milano, Baharia Mograbi, Patrick Brest, Yann Chateau, Olivier Humbert, Emmanuel Chamorey

Abstract

Genomics and transcriptomics have led to the widely-used molecular classification of breast cancer (BC). However, heterogeneous biological behaviors persist within breast cancer subtypes. Metabolomics is a rapidly-expanding field of study dedicated to cellular metabolisms affected by the environment. The aim of this study was to compare metabolomic signatures of BC obtained by 5 different unsupervised machine learning (ML) methods. Fifty-two consecutive patients with BC with an indication for adjuvant chemotherapy between 2013 and 2016 were retrospectively included. We performed metabolomic profiling of tumor resection samples using liquid chromatography-mass spectrometry. Here, four hundred and forty-nine identified metabolites were selected for further analysis. Clusters obtained using 5 unsupervised ML methods (PCA k-means, sparse k-means, spectral clustering, SIMLR and k-sparse) were compared in terms of clinical and biological characteristics. With an optimal partitioning parameter k = 3, the five methods identified three prognosis groups of patients (favorable, intermediate, unfavorable) with different clinical and biological profiles. SIMLR and K-sparse methods were the most effective techniques in terms of clustering. In-silico survival analysis revealed a significant difference for 5-year predicted OS between the 3 clusters. Further pathway analysis using the 449 selected metabolites showed significant differences in amino acid and glucose metabolism between BC histologic subtypes. Our results provide proof-of-concept for the use of unsupervised ML metabolomics enabling stratification and personalized management of BC patients. The design of novel computational methods incorporating ML and bioinformatics techniques should make available tools particularly suited to improving the outcome of cancer treatment and reducing cancer-related mortalities.

Keywords: Breast neoplasms; Computer simulation; Metabolomics; Unsupervised machine learning.

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

© 2020 The Authors.

Figures

Graphical abstract
Graphical abstract
Fig. 1
Fig. 1
Visualization of each cluster by clustering method using T-sne.
Fig. 2
Fig. 2
Silhouette value (SI) representation for each patient by clustering method.
Fig. 3
Fig. 3
Venn diagram of metabolic that were in common or unique to the five clustering methods.
Fig. 4
Fig. 4
Venn diagram of pathways that were in common or unique to the five clustering methods.
Fig. 5
Fig. 5
Boxplot of the 8 metabolites extracted from 5 ML methods.

References

    1. Siegel R.L., Miller K.D., Jemal A. Cancer statistics, 2017. CA Cancer J Clin. 2017;67:7–30.
    1. Perou C.M., Jeffrey S.S., van de Rijn M. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA. 1999;96:9212–9217.
    1. Lockhart D.J., Winzeler E.A. Genomics, gene expression and DNA arrays. Nature. 2000;405:827–836.
    1. Pandey A., Mann M. Proteomics to study genes and genomes. Nature. 2000;405:837–846.
    1. Perou C.M., Sorlie T., Eisen M.B. Molecular portraits of human breast tumours. Nature. 2000;406:747–752.
    1. Sorlie T., Perou C.M., Tibshirani R. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001;98:10869–10874.
    1. Sorlie T., Tibshirani R., Parker J. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA. 2003;100:8418–8423.
    1. Witten D.M., Tibshirani R. A framework for feature selection in clustering. J Am Stat Assoc. 2010;105:713–726.
    1. Candido Dos Reis F.J., Wishart G.C., Dicks E.M. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017;19:58.
    1. Wishart G.C., Azzato E.M., Greenberg D.C. PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer. Breast Cancer Res. 2010;12:R1.
    1. Ross J.S. Multigene predictors in early-stage breast cancer: moving in or moving out? Expert Rev Mol Diagn. 2008;8:129–135.
    1. Ross J.S., Hatzis C., Symmans W.F. Commercialized multigene predictors of clinical outcome for breast cancer. Oncologist. 2008;13:477–493.
    1. Buyse M., Loi S., van't Veer L, Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst. 2006;98:1183–1192.
    1. Cao Y., DePinho R.A., Ernst M., Vousden K. Cancer research: past, present and future. Nat Rev Cancer. 2011;11:749–754.
    1. Ehmann F., Caneva L., Prasad K. Pharmacogenomic information in drug labels: European Medicines Agency perspective. Pharmacogenomics J. 2015;15:201–210.
    1. McShane L.M., Polley M.Y. Development of omics-based clinical tests for prognosis and therapy selection: the challenge of achieving statistical robustness and clinical utility. Clin Trials. 2013;10:653–665.
    1. van de Vijver M.J., He Y.D., van't Veer L.J. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347:1999–2009.
    1. Wang Y., Klijn J.G., Zhang Y. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365:671–679.
    1. Wesolowski R., Ramaswamy B. Gene expression profiling: changing face of breast cancer classification and management. Gene Expr. 2011;15:105–115.
    1. Marusyk A., Almendro V., Polyak K. Intra-tumour heterogeneity: a looking glass for cancer? Nat Rev Cancer. 2012;12:323–334.
    1. Hsu P.P., Sabatini D.M. Cancer cell metabolism: Warburg and beyond. Cell. 2008;134:703–707.
    1. McClellan J., King M.C. Genetic heterogeneity in human disease. Cell. 2010;141:210–217.
    1. Cannon W.B. 2nd ed. Norton & Co.; Oxford, England: 1939. The wisdom of the body.
    1. Roberts L.D., Souza A.L., Gerszten R.E., Clish C.B. Targeted metabolomics. Curr Protoc Mol Biol. 2012 Chapter 30: Unit 30 32 31-24.
    1. Schrimpe-Rutledge A.C., Codreanu S.G., Sherrod S.D., McLean J.A. Untargeted metabolomics strategies-challenges and emerging directions. J Am Soc Mass Spectrom. 2016;27:1897–1905.
    1. Vinayavekhin N., Saghatelian A. Untargeted metabolomics. Curr Protoc Mol Biol. 2010 Chapter 30: Unit 30 31 31-24.
    1. Camacho D.M., Collins K.M., Powers R.K. Next-generation machine learning for biological networks. Cell. 2018;173:1581–1592.
    1. Gal J., Milano G., Ferrero J.M. Optimizing drug development in oncology by clinical trial simulation: why and how? Brief Bioinform. 2017
    1. Yu M.K., Ma J., Fisher J. Visible machine learning for biomedicine. Cell. 2018;173:1562–1565.
    1. Jordan M.I., Mitchell T.M. Machine learning: trends, perspectives, and prospects. Science. 2015;349:255–260.
    1. Tang P., Tse G.M. Immunohistochemical surrogates for molecular classification of breast carcinoma: A 2015 update. Arch Pathol Lab Med. 2016;140:806–814.
    1. Katajamaa M., Oresic M. Processing methods for differential analysis of LC/MS profile data. BMC Bioinf. 2005;6:179.
    1. Pluskal T., Castillo S., Villar-Briones A., Oresic M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinf. 2010;11:395.
    1. Xia J., Mandal R., Sinelnikov I.V. MetaboAnalyst 2.0 – a comprehensive server for metabolomic data analysis. Nucleic Acids Res. 2012;40:W127–133.
    1. Irizarry R.A., Wang C., Zhou Y., Speed T.P. Gene set enrichment analysis made simple. Stat Methods Med Res. 2009;18:565–575.
    1. Saxena A., Prasad M., Gupta A. A review of clustering techniques and developments. Neurocomputing. 2017;267:664–681.
    1. Tibshirani R., Walther G., Hastie T. Estimating the number of clusters in a data set via the gap statistic. J Royal Stat Soc: Series B (Statistical Methodol) 2001;63:411–423.
    1. Kaufman L., Rousseeuw P.J. John Wiley & Sons; 2009. Finding groups in data: an introduction to cluster analysis.
    1. Rousseeuw P.J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.
    1. Davies D.L., Bouldin D.W. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979:224–227.
    1. Caliński T., Harabasz J. A dendrite method for cluster analysis. Commun Stat-Theory Methods. 1974;3:1–27.
    1. Wang B., Zhu J., Pierson E. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017;14:414–416.
    1. Arthur D., Vassilvitskii S. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics; 2007. k-means++: The advantages of careful seeding; pp. 1027–1035.
    1. Lloyd S. Least squares quantization in PCM. IEEE Trans. Inform. Theory. 1982;28(2):129–137. doi: 10.1109/TIT.1982.1056489.
    1. Steinhaus H. Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci., C1. III. 1956;IV:801–804.
    1. Ng A.Y., Jordan M.I., Weiss Y. Advances in neural information processing systems. On spectral clustering; 2002. Analysis and an algorithm; pp. 849–856.
    1. Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17:395–416.
    1. Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 1996:267–288.
    1. Gilet C, Deprez M, Caillau J-B, Barlaud M. Clustering with feature selection using alternating minimization, Application to computational biology. arXiv preprint arXiv:1711.02974 2017.
    1. Lvd Maaten, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–2605.
    1. Team R.C.R. A language and environment for statistical. Computing. 2013
    1. Witten D.M., Tibshirani R. sparcl: Perform sparse hierarchical clustering and sparse k-means clustering. R package version. 2013;1
    1. Wishart G.C., Bajdik C.D., Azzato E.M. A population-based validation of the prognostic model PREDICT for early breast cancer. Eur J Surg Oncol. 2011;37:411–417.
    1. Beger R.D. A review of applications of metabolomics in cancer. Metabolites. 2013;3:552–574.
    1. Gunther U.L. Metabolomics biomarkers for breast cancer. Pathobiology. 2015;82:153–165.
    1. McCartney A., Vignoli A., Biganzoli L. Metabolomics in breast cancer: a decade in review. Cancer Treat Rev. 2018;67:88–96.
    1. Silva C., Perestrelo R., Silva P. Breast cancer metabolomics: from analytical platforms to multivariate data analysis. A Review. Metabolites. 2019:9.
    1. Asiago V.M., Alvarado L.Z., Shanaiah N. Early detection of recurrent breast cancer using metabolite profiling. Cancer Res. 2010;70:8309–8318.
    1. Cardoso M.R., Santos J.C., Ribeiro M.L. A Metabolomic approach to predict breast cancer behavior and chemotherapy response. Int J Mol Sci. 2018;19
    1. Karim M.R., Beyan O., Zappa A. Deep learning-based clustering approaches for bioinformatics. Brief Bioinform. 2020
    1. Bianchini G., Balko J.M., Mayer I.A. Triple-negative breast cancer: challenges and opportunities of a heterogeneous disease. Nat Rev Clin Oncol. 2016;13:674–690.
    1. Mills M.N., Yang G.Q., Oliver D.E. Histologic heterogeneity of triple negative breast cancer: A national cancer centre database analysis. Eur J Cancer. 2018;98:48–58.
    1. Belkacemi Y., Hanna N.E., Besnard C. Local and regional breast cancer recurrences: salvage therapy options in the new era of molecular subtypes. Front Oncol. 2018;8:112.
    1. Buonaguro F.M., Caposio P., Tornesello M.L. Cancer diagnostic and predictive biomarkers 2018. Biomed Res Int. 2019;2019:3879015.
    1. Ponde N.F., Zardavas D., Piccart M. Progress in adjuvant systemic therapy for breast cancer. Nat Rev Clin Oncol. 2018
    1. Senkus E., Kyriakides S., Ohno S. Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2015;26(Suppl 5):v8–30.
    1. Assi H.A., Khoury K.E., Dbouk H. Epidemiology and prognosis of breast cancer in young women. J Thorac Dis. 2013;5(Suppl 1):S2–8.
    1. Wang K., Zhu G.Q., Shi Y. Long-term survival differences between T1–2 invasive lobular breast cancer and corresponding ductal carcinoma after breast-conserving surgery: A propensity-scored matched longitudinal cohort study. Clin Breast Cancer. 2019;19:e101–e115.
    1. Wasif N., Maggard M.A., Ko C.Y., Giuliano A.E. Invasive lobular vs. ductal breast cancer: a stage-matched comparison of outcomes. Ann Surg Oncol. 2010;17:1862–1869.
    1. Yersal O., Barutca S. Biological subtypes of breast cancer: Prognostic and therapeutic implications. World J Clin Oncol. 2014;5:412–424.
    1. Topol E.J. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56.
    1. Pavlova N.N., Thompson C.B. The emerging hallmarks of cancer metabolism. Cell Metab. 2016;23:27–47.
    1. Hainaut P., Plymoth A. Targeting the hallmarks of cancer: towards a rational approach to next-generation cancer therapy. Curr Opin Oncol. 2013;25:50–51.
    1. Hanahan D., Weinberg R.A. Hallmarks of cancer: the next generation. Cell. 2011;144:646–674.
    1. Li Z., Zhang H. Reprogramming of glucose, fatty acid and amino acid metabolism for cancer progression. Cell Mol Life Sci. 2016;73:377–392.
    1. DeBerardinis R.J., Chandel N.S. Fundamentals of cancer metabolism. Sci Adv. 2016;2
    1. Haukaas T.H., Euceda L.R., Giskeodegard G.F., Bathen T.F. Metabolic portraits of breast cancer by HR MAS MR spectroscopy of intact tissue samples. Metabolites. 2017:7.
    1. Jeon H., Kim J.H., Lee E. Methionine deprivation suppresses triple-negative breast cancer metastasis in vitro and in vivo. Oncotarget. 2016;7:67223–67234.
    1. Melone M.A.B., Valentino A., Margarucci S. The carnitine system and cancer metabolic plasticity. Cell Death Dis. 2018;9:228.
    1. Thomas T.J., Thomas T. Cellular and animal model studies on the growth inhibitory effects of polyamine analogues on breast cancer. Med Sci (Basel) 2018:6.
    1. Xiao F., Wang C., Yin H. Leucine deprivation inhibits proliferation and induces apoptosis of human breast cancer cells via fatty acid synthase. Oncotarget. 2016;7:63679–63689.
    1. Zuo Y., Ulu A., Chang J.T., Frost J.A. Contributions of the RhoA guanine nucleotide exchange factor Net1 to polyoma middle T antigen-mediated mammary gland tumorigenesis and metastasis. Breast Cancer Res. 2018;20:41.
    1. Lecuyer L., Dalle C., Lyan B. Plasma metabolomic signatures associated with long-term breast cancer risk in the SU.VI.MAX prospective cohort. Cancer Epidemiol Biomarkers Prev. 2019
    1. Oikari S., Kettunen T., Tiainen S. UDP-sugar accumulation drives hyaluronan synthesis in breast cancer. Matrix Biol. 2018;67:63–74.
    1. Pan H., Xia K., Zhou W. Low serum creatine kinase levels in breast cancer patients: a case-control study. PLoS One. 2013;8
    1. Phannasil P., Ansari I.H., El Azzouny M. Mass spectrometry analysis shows the biosynthetic pathways supported by pyruvate carboxylase in highly invasive breast cancer cells. Biochim Biophys Acta Mol Basis Dis. 2017;1863:537–551.
    1. Mason E.F., Rathmell J.C. Cell metabolism: an essential link between cell growth and apoptosis. Biochim Biophys Acta. 2011;1813:645–654.
    1. Hensley C.T., Wasti A.T., DeBerardinis R.J. Glutamine and cancer: cell biology, physiology, and clinical opportunities. J Clin Invest. 2013;123:3678–3684.
    1. Warburg O., Wind F., Negelein E. The metabolism of tumors in the body. J Gen Physiol. 1927;8:519–530.
    1. Anderson N.M., Mucka P., Kern J.G., Feng H. The emerging role and targetability of the TCA cycle in cancer metabolism. Protein Cell. 2018;9:216–237.
    1. Fernandez M.F., Reina-Perez I., Astorga J.M. Breast Cancer and Its Relationship with the Microbiota. Int J Environ Res Public Health. 2018;15
    1. Sullivan M.R., Danai L.V., Lewis C.A. Quantification of microenvironmental metabolites in murine cancers reveals determinants of tumor nutrient availability. Elife. 2019:8.
    1. Cancer Genome Atlas Research N Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068.
    1. Zhang J., Baran J., Cros A. International Cancer Genome Consortium Data Portal–a one-stop shop for cancer genomics data. Database (Oxford) 2011;2011:bar026.
    1. Mitra S., Saha S. A multiobjective multi-view cluster ensemble technique: Application in patient subclassification. PLoS One. 2019;14
    1. Ramazzotti D., Lal A., Wang B. Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival. Nat Commun. 2018;9:4453.
    1. Rappoport N., Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46:10546–10562.
    1. Wu C., Zhou F., Ren J. A selective review of multi-level omics data integration using variable selection. High Throughput. 2019:8.
    1. Armitage E.G., Barbas C. Metabolomics in cancer biomarker discovery: current trends and future perspectives. J Pharm Biomed Anal. 2014;87:1–11.
    1. Bennett D.A., Waters M.D. Applying biomarker research. Environ Health Perspect. 2000;108:907–910.
    1. Vermeersch K.A., Styczynski M.P. Applications of metabolomics in cancer research. J Carcinog. 2013;12:9.
    1. Jacob M., Lopata A.L., Dasouki M., Abdel Rahman A.M. Metabolomics toward personalized medicine. Mass Spectrom Rev. 2017
    1. Trivedi D.K., Hollywood K.A., Goodacre R. Metabolomics for the masses: The future of metabolomics in a personalized world. New Horiz Transl Med. 2017;3:294–305.
    1. Wishart D.S. Emerging applications of metabolomics in drug discovery and precision medicine. Nat Rev Drug Discov. 2016;15:473–484.

Source: PubMed

3
Sottoscrivi