Medical big data: promise and challenges

Choong Ho Lee, Hyung-Jin Yoon, Choong Ho Lee, Hyung-Jin Yoon

Abstract

The concept of big data, commonly characterized by volume, variety, velocity, and veracity, goes far beyond the data type and includes the aspects of data analysis, such as hypothesis-generating, rather than hypothesis-testing. Big data focuses on temporal stability of the association, rather than on causal relationship and underlying probability distribution assumptions are frequently not required. Medical big data as material to be analyzed has various features that are not only distinct from big data of other disciplines, but also distinct from traditional clinical epidemiology. Big data technology has many areas of application in healthcare, such as predictive modeling and clinical decision support, disease or safety surveillance, public health, and research. Big data analytics frequently exploits analytic methods developed in data mining, including classification, clustering, and regression. Medical big data analyses are complicated by many technical issues, such as missing values, curse of dimensionality, and bias control, and share the inherent limitations of observation study, namely the inability to test causality resulting from residual confounding and reverse causation. Recently, propensity score analysis and instrumental variable analysis have been introduced to overcome these limitations, and they have accomplished a great deal. Many challenges, such as the absence of evidence of practical benefits of big data, methodological issues including legal and ethical issues, and clinical integration and utility issues, must be overcome to realize the promise of medical big data as the fuel of a continuous learning healthcare system that will improve patient outcome and reduce waste in areas including nephrology.

Keywords: Big data; Data mining; Epidemiology; Healthcare; Statistics.

Conflict of interest statement

Conflicts of interest

All authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
A continuous learning healthcare system.

References

    1. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309:1351–1352. doi: 10.1001/jama.2013.393.
    1. Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13:350–359. doi: 10.1038/nrcardio.2016.42.
    1. Bellazzi R. Big data and biomedical informatics: a challenging opportunity. Yearb Med Inform. 2014;9:8–13. doi: 10.15265/IY-2014-0024.
    1. Scruggs SB, Watson K, Su AI, Hermjakob H, Yates JR, 3rd, Lindsey ML, Ping P. Harnessing the heart of big data. Circ Res. 2015;116:1115–1119. doi: 10.1161/CIRCRESAHA.115.306013.
    1. Sinha A, Hripcsak G, Markatou M. Large datasets in biomedicine: a discussion of salient analytic issues. J Am Med Inform Assoc. 2009;16:759–767. doi: 10.1197/jamia.M2780.
    1. Dinov ID. Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data. Gigascience. 2016;5:12. doi: 10.1186/s13742-016-0117-6.
    1. Slobogean GP, Giannoudis PV, Frihagen F, Forte ML, Morshed S, Bhandari M. Bigger data, bigger problems. J Orthop Trauma. 2015;29(Suppl 12):S43–S46. doi: 10.1097/BOT.0000000000000463.
    1. Tanaka S, Tanaka S, Kawakami K. Methodological issues in observational studies and non-randomized controlled trials in oncology in the era of big data. Jpn J Clin Oncol. 2015;45:323–327. doi: 10.1093/jjco/hyu220.
    1. Wang W, Krishnan E. Big data and clinicians: a review on the state of the science. JMIR Med Inform. 2014;2:e1. doi: 10.2196/medinform.2913.
    1. Bellazzi R, Zupan B. Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform. 2008;77:81–97. doi: 10.1016/j.ijmedinf.2006.11.006.
    1. Binder H, Blettner M. Big data in medical science--a biostatistical view. Dtsch Arztebl Int. 2015;112:137–142.
    1. DeRouen TA. Promises and pitfalls in the use of “Big Data” for clinical research. J Dent Res. 2015;94(9 Suppl):107S–109S. doi: 10.1177/0022034515587863.
    1. Iwashyna TJ, Liu V. What’s so different about big data?. A primer for clinicians trained to think epidemiologically. Ann Am Thorac Soc. 2014;11:1130–1135. doi: 10.1513/AnnalsATS.201405-185AS.
    1. Khoury MJ, Ioannidis JP. Medicine. Big data meets public health. Science. 2014;346:1054–1055. doi: 10.1126/science.aaa2709.
    1. Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policy implications. Health Aff (Millwood) 2014;33:1115–1122. doi: 10.1377/hlthaff.2014.0147.
    1. Meltzer AC, Pines JM. What big data can and cannot tell us about emergency department quality for urolithiasis. Acad Emerg Med. 2015;22:481–482. doi: 10.1111/acem.12639.
    1. Ketchersid T. Big data in nephrology: friend or foe? Blood Purif. 2013;36:160–164. doi: 10.1159/000356751.
    1. Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2015;20:318–331. doi: 10.1016/j.drudis.2014.10.012.
    1. Iavindrasana J, Cohen G, Depeursinge A, Müller H, Meyer R, Geissbuhler A. Clinical data mining: a review. Yearb Med Inform. 2009:121–133.
    1. Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593.
    1. Wong WK, Boscardin WJ, Postlethwaite AE, Furst DE. Handling missing data issues in clinical trials for rheumatic diseases. Contemp Clin Trials. 2011;32:1–9. doi: 10.1016/j.cct.2010.09.001.
    1. Dinov ID. Volume and value of big healthcare data. J Med Stat Inform. 2016;4:3. doi: 10.7243/2053-7662-4-3.
    1. Li L. Dimension reduction for high-dimensional data. Methods Mol Biol. 2010;620:417–434. doi: 10.1007/978-1-60761-580-4_14.
    1. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–2517. doi: 10.1093/bioinformatics/btm344.
    1. Alyass A, Turcotte M, Meyre D. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics. 2015;8:33. doi: 10.1186/s12920-015-0108-y.
    1. Laborde-Castérot H, Agrinier N, Thilly N. Performing both propensity score and instrumental variable analyses in observational studies often leads to discrepant results: a systematic review. J Clin Epidemiol. 2015;68:1232–1240. doi: 10.1016/j.jclinepi.2015.04.003.
    1. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342:1887–1892. doi: 10.1056/NEJM200006223422507.
    1. Tai V, Grey A, Bolland MJ. Results of observational studies: analysis of findings from the Nurses’ Health Study. PLoS One. 2014;9:e110403. doi: 10.1371/journal.pone.0110403.
    1. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342:1878–1886. doi: 10.1056/NEJM200006223422506.
    1. Boef AG, Dekkers OM, le Cessie S. Mendelian randomization studies: a review of the approaches used and the quality of reporting. Int J Epidemiol. 2015;44:496–511. doi: 10.1093/ije/dyv071.

Source: PubMed

3
구독하다