Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools

Giovanna Nicora, Francesca Vitali, Arianna Dagliati, Nophar Geifman, Riccardo Bellazzi, Giovanna Nicora, Francesca Vitali, Arianna Dagliati, Nophar Geifman, Riccardo Bellazzi

Abstract

In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing. We systematically retrieved peer-reviewed journals published from 2014 to 2019, select and thoroughly describe the tools presenting the most promising innovations regarding the integration of heterogeneous data, the machine learning methodologies that successfully tackled the complexity of multi-omics data, and the frameworks to deliver actionable results for clinical practice. The review is organized according to the applied methods: Deep learning, Network-based methods, Clustering, Features Extraction, and Transformation, Factorization. We provide an overview of the tools available in each methodological group and underline the relationship among the different categories. Our analysis revealed how multi-omics datasets could be exploited to drive precision oncology, but also current limitations in the development of multi-omics data integration.

Keywords: cancer; machine learning; multi-omics; oncology; systematic review; tools.

Copyright © 2020 Nicora, Vitali, Dagliati, Geifman and Bellazzi.

Figures

Figure 1
Figure 1
(A) Linkage between different methodological categories. References to papers (see Table 1). That could be categorized in different groups are reported near the link. (B) Publications by year of publication and Field-Weighted Citation Impact. Different colors indicate exploited methods, shapes aims, and outputs. Papers with red borders have source code or provide a tool. Papers in the “Subgroup identification” group and/or with free tool result to be the most cited across years. The reference numbers are reported in Table 1.

References

    1. Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. (2009) 25:2906–12. 10.1093/bioinformatics/btp543
    1. Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat. (2013) 7:523–42. 10.1214/12-AOAS597
    1. List M, Hauschild A-C, Tan Q, Kruse TA, Mollenhauer J, Baumbach J, et al. . Classification of breast cancer subtypes by combining gene expression and DNA methylation data. J Integr Bioinform. (2014) 11:236. 10.2390/biecoll-jib-2014-236
    1. Ray P, Zheng L, Lucas J, Carin L. Bayesian joint analysis of heterogeneous genomics data. Bioinformatics. (2014) 30:1370–6. 10.1093/bioinformatics/btu064
    1. Gligorijević V, Malod-Dognin N, PrŽulj N. Patient-specific data fusion for cancer stratification and personalised treatment. Pacific Symp Biocomput. (2016) 21:321–332. 10.1142/9789814749411_0030
    1. Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. (2011) 7:26. 10.1038/msb.2011.26
    1. Napolitano F, Zhao Y, Moreira VM, Tagliaferri R, Kere J, D'Amato M, et al. . Drug repositioning: a machine-learning approach through data integration. J Cheminform. (2013) 5:30. 10.1186/1758-2946-5-30
    1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2018) 68:394–424. 10.3322/caac.21492
    1. Knox SS. From “omics” to complex disease: a systems biology approach to gene-environment interactions in cancer. Cancer Cell Int. (2010) 10:11. 10.1186/1475-2867-10-11
    1. Huang S, Chaudhary K, Garmire LX. More is better: recent progress in multi-omics data integration methods. Front Genet. (2017) 8:84. 10.3389/fgene.2017.00084
    1. Li Y, Wu FX, Ngom A. A review on machine learning principles for multi-view biological data integration. Brief Bioinform. (2018) 19:325–40. 10.1093/bib/bbw113
    1. Agarwal M, Adhil M, Talukder AK. Multi-omics multi-scale big data analytics for cancer genomics. Lect Notes Comput Sci. (2015) 9498:228–43. 10.1007/978-3-319-27057-9_16
    1. Amar D, Shamir R. Constructing module maps for integrated analysis of heterogeneous biological networks. Nucleic Acids Res. (2014) 42:4208–19. 10.1093/nar/gku102
    1. Ao L, Song X, Li X, Tong M, Guo Y, Li J, et al. . An individualized prognostic signature and multi-omics distinction for early stage hepatocellular carcinoma patients with surgical resection. Oncotarget. (2016) 7:24097–110. 10.18632/oncotarget.8212
    1. Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, et al. . Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. (2018) 14:e8124. 10.15252/msb.20178124
    1. Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. . Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. (2014) 11:333–7. 10.1038/nmeth.2810
    1. Beal J, Montagud A, Traynard P, Barillot E, Calzone L. Personalization of logical models with multi-omics data allows clinical stratification of patients. Front Physiol. (2019) 9:1965. 10.3389/fphys.2018.01965
    1. Benfeitas R, Bidkhori G, Mukhopadhyay B, Klevstig M, Arif M, Zhang C, et al. . Characterization of heterogeneous redox responses in hepatocellular carcinoma patients using network analysis. EBioMedicine. (2019) 40:471–87. 10.1016/j.ebiom.2018.12.057
    1. Bonnet E, Calzone L, Michoel T. Integrative multi-omics module network inference with lemon-tree. PLoS Comput Biol. (2015) 11:3983 10.1371/journal.pcbi.1003983
    1. Cancemi P, Buttacavoli M, Cara GD, Albanese NN, Bivona S, Pucci-Minafra I, et al. . A multiomics analysis of S100 protein family in breast cancer. Oncotarget. (2018) 9:29064–81. 10.18632/oncotarget.25561
    1. Cavalli FMG, Remke M, Rampasek L, Peacock J, Shih DJH, Luu B, et al. . Intertumoral heterogeneity within medulloblastoma subgroups. Cancer Cell. (2017) 31:737–54.e6. 10.1016/j.ccell.2017.05.005
    1. Champion M, Brennan K, Croonenborghs T, Gentles AJ, Pochet N, Gevaert O. Module analysis captures pancancer genetically and epigenetically deregulated cancer driver genes for smoking and antiviral response. EBioMedicine. (2018) 27:156–66. 10.1016/j.ebiom.2017.11.028
    1. Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res. (2018) 24:1248–59. 10.1158/1078-0432.CCR-17-0853
    1. Cho H, Berger B, Peng J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. (2016) 3:540–8.e5. 10.1016/j.cels.2016.10.017
    1. Costa RL, Boroni M, Soares MA. Distinct co-expression networks using multi-omic data reveal novel interventional targets in HPV-positive and negative head-and-neck squamous cell cancer. Sci Rep. (2018) 8:5. 10.1038/s41598-018-33498-5
    1. Costello JC, Heiser LM, Georgii E, Gönen M, Menden MP, Wang NJ, et al. . A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. (2014) 32:1202–12. 10.1038/nbt.2877
    1. Dimitrakopoulos C, Hindupur SK, Hafliger L, Behr J, Montazeri H, Hall MN, et al. . Network-based integration of multi-omics data for prioritizing cancer genes. Bioinformatics. (2018) 34:2441–8. 10.1093/bioinformatics/bty148
    1. Drabovich AP, Saraon P, Drabovich M, Karakosta TD, Dimitromanolakis A, Hyndman ME, et al. . Multi-omics biomarker pipeline reveals elevated levels of protein-glutamine gamma-glutamyltransferase 4 in seminal plasma of prostate cancer patients. Mol Cell Proteomics. (2019) 18:1807–23. 10.1074/mcp.RA119.001612
    1. Francescatto M, Chierici M, Rezvan Dezfooli S, Zandonà A, Jurman G, Furlanello C, et al. . Multi-omics integration for neuroblastoma clinical endpoint prediction. Biol Direct. (2018) 13:8. 10.1186/s13062-018-0207-8
    1. Gabasova E, Reid J, Wernisch L. Clusternomics: integrative context-dependent clustering for heterogeneous datasets. PLoS Comput Biol. (2017) 13:e1005781. 10.1371/journal.pcbi.1005781
    1. Gao Y-L, Hou M-X, Liu J-X, Kong X-Z. An integrated graph regularized non-negative matrix factorization model for gene co-expression network analysis. IEEE Access. (2019) 7:126594–602. 10.1109/ACCESS.2019.2939405
    1. Griffin PJ, Zhang Y, Johnson WE, Kolaczyk ED. Detection of multiple perturbations in multi-omics biological networks. Biometrics. (2018) 74:1351–61. 10.1111/biom.12893
    1. Hoadley KA, Yau C, Wolf DM, Cherniack AD, Tamborero D, Ng S, et al. . Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. (2014) 158:929–44. 10.1016/j.cell.2014.06.049
    1. Hua L, Zheng WY, Xia H, Zhou P. Detecting the potential cancer association or metastasis by multi-omics data analysis. Genet Mol Res. (2016) 15:e038987 10.4238/gmr.15038987
    1. Huang L, Brunell D, Stephan C, Mancuso J, Yu X, He B, et al. . Driver network as a biomarker: systematic integration and network modeling of multi-omics data to derive driver signaling pathways for drug combination prediction. Bioinformatics. (2019) 35:3709–17. 10.1093/bioinformatics/btz109
    1. Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, et al. . Salmon: survival analysis learning with multi-omics neural networks on breast cancer. Front Genet. (2019) 10:166. 10.3389/fgene.2019.00166
    1. Kim JY, Lee H, Woo J, Yue W, Kim K, Choi S, et al. . Reconstruction of pathway modification induced by nicotinamide using multi-omic network analyses in triple negative breast cancer. Sci Rep. (2017) 7:7. 10.1038/s41598-017-03322-7
    1. Kim M, Oh I, Ahn J. An improved method for prediction of cancer prognosis by network learning. Genes. (2018) 9:1–11. 10.3390/genes9100478
    1. Kim SY, Jeong HH, Kim J, Moon JH, Sohn KA. Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies. Biol Direct. (2019) 14:8. 10.1186/s13062-019-0239-8
    1. Koh HWL, Fermin D, Vogel C, Choi KP, Ewing RM, Choi H, et al. . iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery. npj Syst Biol Appl. (2019) 5:22. 10.1038/s41540-019-0099-y
    1. Lee S-I, Celik S, Logsdon BA, Lundberg SM, Martins TJ, Oehler VG, et al. . A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nat Commun. (2018) 9:5. 10.1038/s41467-017-02465-5
    1. Liang M, Li Z, Chen T, Zeng J. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans Comput Biol Bioinforma. (2015) 12:928–37. 10.1109/TCBB.2014.2377729
    1. Luo Z, Wang W, Li F, Songyang Z, Feng X, Xin C, et al. . Pan-cancer analysis identifies telomerase-associated signatures and cancer subtypes. Mol Cancer. (2019) 18:106. 10.1186/s12943-019-1035-x
    1. Ma T, Zhang A. Affinity network fusion and semi-supervised learning for cancer patient clustering. Methods. (2018) 145:16–24. 10.1016/j.ymeth.2018.05.020
    1. Mariette J, Villa-Vialaneix N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics. (2018) 34:1009–15. 10.1093/bioinformatics/btx682
    1. Meng C, Kuster B, Culhane AC, Gholami AM. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics. (2014) 15:162. 10.1186/1471-2105-15-162
    1. Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics. (2018) 19:71–86. 10.1093/biostatistics/kxx017
    1. Nguyen T, Tagett R, Diaz D, Draghici S. A novel approach for data integration and disease subtyping. Genome Res. (2017) 27:2025–39. 10.1101/gr.215129.116
    1. O'Connell MJ, Lock EF. R. JIVE for exploration of multi-source molecular data. Bioinformatics. (2016) 32:2877–9. 10.1093/bioinformatics/btw324
    1. Pai S, Hui S, Isserlin R, Shah MA, Kaka H, Bader GD, et al. . netDx: interpretable patient classification using integrated patient similarity networks. Mol Syst Biol. (2019) 15:e8497. 10.15252/msb.20188497
    1. Raphael BJ, Hruban RH, Aguirre AJ, Moffitt RA, Yeh JJ, Stewart C, et al. Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer Cell. (2017) 32:185–203.e13. 10.1016/j.ccell.2017.07.007
    1. Rappoport N, Shamir R, Schwartz R. NEMO: cancer subtyping by integration of partial multi-omic data. Bioinformatics. (2019) 35:3348–56. 10.1093/bioinformatics/btz058
    1. Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol. (2017) 13:e1005752 10.1371/journal.pcbi.1005752
    1. Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics. (2019) 35:i501–9. 10.1093/bioinformatics/btz318
    1. Sehgal V, Seviour EG, Moss TJ, Mills GB, Azencott R, Ram PT. Robust selection algorithm (RSA) for multi-omic biomarker discovery; integration with functional network analysis to identify miRNA regulated pathways in multiple cancers. PLoS ONE. (2015) 10:72. 10.1371/journal.pone.0140072
    1. Song X, Ji J, Gleason KJ, Yang F, Martignetti JA, Chen LS, et al. . Insights into impact of DNA copy number alteration and methylation on the proteogenomic landscape of human ovarian cancer via a multi-omics integrative analysis. Mol Cell Proteomics. (2019) 18(8 Suppl.1):S52–65. 10.1074/mcp.RA118.001220
    1. Speicher NK, Pfeifer N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics. (2015) 31:i268–75. 10.1093/bioinformatics/btv244
    1. Vitali F, Cohen LD, Demartini A, Amato A, Eterno V, Zambelli A, et al. . A network-based data integration approach to support drug repurposing and multi-Target therapies in triple negative breast cancer. PLoS ONE. (2016) 11:e0162407. 10.1371/journal.pone.0162407
    1. Woo HG, Choi J-H, Yoon S, Jee BA, Cho EJ, Lee J-H, et al. . Integrative analysis of genomic and epigenomic regulation of the transcriptome in liver cancer. Nat Commun. (2017) 8:839. 10.1038/s41467-017-00991-w
    1. Wu D, Wang D, Zhang MQ, Gu J. Fast dimension reduction and integrative clustering of multi-omics data using lowrank approximation: application to cancer molecular classification. BMC Genomics. (2015) 16:1022. 10.1186/s12864-015-2223-8
    1. Yang Z, Liu B, Lin T, Zhang Y, Zhang L, Wang M, et al. . Multiomics analysis on DNA methylation and the expression of both messenger RNA and microRNA in lung adenocarcinoma. J Cell Physiol. (2019) 234:7579–86. 10.1002/jcp.27520
    1. Yuan L, Guo LH, Yuan CA, Zhang Y, Han K, Nandi AK, et al. . Integration of multi-omics data for gene regulatory network inference and application to breast cancer. IEEE/ACM Trans Comput Biol Bioinforma. (2019) 16:782–91. 10.1109/TCBB.2018.2866836
    1. Wang Z, Wei Y, Zhang R, Su L, Gogarten SM, Liu G, et al. . Multi-omics analysis reveals a HIF network and hub gene EPAS1 associated with lung adenocarcinoma. EBioMedicine. (2018) 32:93–101. 10.1016/j.ebiom.2018.05.024
    1. Zhang L, Lv C, Jin Y, Cheng G, Fu Y, Yuan D, et al. . Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front Genet. (2018) 9:477. 10.3389/fgene.2018.00477
    1. Zhou Y, Liu Y, Li K, Zhang R, Qiu F, Zhao N, et al. . ICan: an integrated co-alteration network to identify ovarian cancer-related genes. PLoS ONE. (2015) 10:e0116095. 10.1371/journal.pone.0116095
    1. Zhu B, Song N, Shen R, Arora A, Machiela MJ, Song L, et al. . Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep. (2017) 7:8. 10.1038/s41598-017-17031-8
    1. Žitnik M, Zupan B. Gene network inference by fusing data from diverse distributions. Bioinformatics. (2015) 31:i230–9. 10.1093/bioinformatics/btv258
    1. Tang B, Pan Z, Yin K, Khateeb A. Recent advances of deep learning in bioinformatics and computational biology. Front Genet. (2019) 10:214. 10.3389/fgene.2019.00214
    1. Wang D, Gu J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant Biol. (2016) 4:58–67. 10.1007/s40484-016-0063-4
    1. Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S, et al. . A selective review of multi-level omics data integration using variable selection. High-Throughput. (2019) 8:4. 10.3390/ht8010004
    1. Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. (2008) 9:S4. 10.1186/gb-2008-9-s1-s4
    1. Conesa A, Beck S. Making multi-omics data accessible to researchers. Sci Data. (2019) 6:251. 10.1038/s41597-019-0258-4
    1. Ollier W, Sprosen T, Peakman T. UK Biobank: from concept to reality. Pharmacogenomics. (2005) 6:639–46. 10.2217/14622416.6.6.639
    1. Liu SH, Shen PC, Chen CY, Hsu AN, Cho YC, Lai YL, et al. . DriverDBv3: a multi-omics database for cancer driver gene research. Nucleic Acids Res. (2020) 48:D863–70. 10.1093/nar/gkz964
    1. Vasaikar SV, Straub P, Wang J, Zhang B. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res. (2018) 46:D956–63. 10.1093/nar/gkx1090
    1. Sathyanarayanan A, Gupta R, Thompson EW, Nyholt DR, Bauer DC, Nagaraj SH, et al. . A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping. Brief Bioinform. (2019). [Epub ahead of print]. 10.1093/bib/bbz121.
    1. McCabe SD, Lin DY, Love MI. Consistency and overfitting of multi-omics methods on experimental data. Brief Bioinform. (2019). [Epub ahead of print]. 10.1093/bib/bbz070.

Source: PubMed

3
Subscribe