Application of penalized linear regression methods to the selection of environmental enteropathy biomarkers

Miao Lu, Jianhui Zhou, Caitlin Naylor, Beth D Kirkpatrick, Rashidul Haque, William A Petri Jr, Jennie Z Ma, Miao Lu, Jianhui Zhou, Caitlin Naylor, Beth D Kirkpatrick, Rashidul Haque, William A Petri Jr, Jennie Z Ma

Abstract

Background: Environmental Enteropathy (EE) is a subclinical condition caused by constant fecal-oral contamination and resulting in blunting of intestinal villi and intestinal inflammation. Of primary interest in the clinical research is to evaluate the association between non-invasive EE biomarkers and malnutrition in a cohort of Bangladeshi children. The challenges are that the number of biomarkers/covariates is relatively large, and some of them are highly correlated.

Methods: Many variable selection methods are available in the literature, but which are most appropriate for EE biomarker selection remains unclear. In this study, different variable selection approaches were applied and the performance of these methods was assessed numerically through simulation studies, assuming the correlations among covariates were similar to those in the Bangladesh cohort. The suggested methods from simulations were applied to the Bangladesh cohort to select the most relevant biomarkers for the growth response, and bootstrapping methods were used to evaluate the consistency of selection results.

Results: Through simulation studies, SCAD (Smoothly Clipped Absolute Deviation), Adaptive LASSO (Least Absolute Shrinkage and Selection Operator) and MCP (Minimax Concave Penalty) are the suggested variable selection methods, compared to traditional stepwise regression method. In the Bangladesh data, predictors such as mother weight, height-for-age z-score (HAZ) at week 18, and inflammation markers (Myeloperoxidase (MPO) at week 12 and soluable CD14 at week 18) are informative biomarkers associated with children's growth.

Conclusions: Penalized linear regression methods are plausible alternatives to traditional variable selection methods, and the suggested methods are applicable to other biomedical studies. The selected early-stage biomarkers offer a potential explanation for the burden of malnutrition problems in low-income countries, allow early identification of infants at risk, and suggest pathways for intervention.

Trial registration: This study was retrospectively registered with ClinicalTrials.gov, number NCT01375647, on June 3, 2011.

Keywords: Biomarker selection; Correlated covariates; Environmental enteropathy; Malnutrition; Penalized linear regression.

Figures

Fig. 1
Fig. 1
Heatmap of correlation for all biomarkers

References

    1. Fan J, Lv J. A selective overview of variable selection in high dimensional feature space. Stat Sin. 2010;20(1):101.
    1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer; 2009.
    1. Harrell FE. Regression Modeling Strategies. New York: Springer; 2001.
    1. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B (Stat Methodol) 1996;58(1):267–88.
    1. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):1348–60. doi: 10.1198/016214501753382273.
    1. Avalos M, Adroher N, Lagarde E, Thiessard F, Grandvalet Y, Contrand B, Orriols L. Prescription-drug-related risk in driving: comparing conventional and lasso shrinkage logistic regressions. Epidemiology. 2012;23:706–12. doi: 10.1097/EDE.0b013e31825fa528.
    1. Mansiaux Y, Carrat F. Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with h1n1pdm influenza infections. BMC Med Res Methodol. 2014;14(1):1. doi: 10.1186/1471-2288-14-99.
    1. Naylor C, Lu M, Haque R, Mondal D, Buonomo E, Nayak U, Mychaleckyj JC, Kirkpatrick B, Colgate R, Carmolli M, et al. Environmental enteropathy, oral vaccine failure and growth faltering in infants in bangladesh. EBioMedicine. 2015;2(11):1759–66. doi: 10.1016/j.ebiom.2015.09.036.
    1. Burgess SL, Lu M, Ma JZ, Naylor C, Donowitz JR, Kirkpatrick BD, Haque R, Petri WA. Inflammatory markers predict episodes of wheezing during the first year of life in bangladesh. Respir Med. 2016;110:53–7. doi: 10.1016/j.rmed.2015.11.009.
    1. Pavlou M, Ambler G, Seaman S, De Iorio M, Omar RZ. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med. 2016;35(7):1159–77. doi: 10.1002/sim.6782.
    1. Korpe PS, Petri WA. Environmental enteropathy: critical implications of a poorly understood condition. Trends Mol Med. 2012;18(6):328–36. doi: 10.1016/j.molmed.2012.04.007.
    1. Gilmartin AA, Petri WA. Exploring the role of environmental enteropathy in malnutrition, infant development and oral vaccine response. Phil Trans R Soc B. 2015;370(1671):20140143. doi: 10.1098/rstb.2014.0143.
    1. Kirkpatrick BD, Colgate ER, Mychaleckyj JC, Haque R, Dickson DM, Carmolli MP, Nayak U, Taniuchi M, Naylor C, Qadri F, et al. The “performance of rotavirus and oral polio vaccines in developing countries”(provide) study: description of methods of an interventional study designed to explore complex biologic problems. Am J Trop Med Hyg. 2015;92(4):744–51. doi: 10.4269/ajtmh.14-0518.
    1. Hoddinott J, Maluccio JA, Behrman JR, Flores R, Martorell R. Effect of a nutrition intervention during early childhood on economic productivity in guatemalan adults. The Lancet. 2008;371(9610):411–6. doi: 10.1016/S0140-6736(08)60205-6.
    1. Dewey KG, Begum K. Long-term consequences of stunting in early life. Matern Child Nutr. 2011;7(s3):5–18. doi: 10.1111/j.1740-8709.2011.00349.x.
    1. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B (Stat Methodol) 2005;67(2):301–20. doi: 10.1111/j.1467-9868.2005.00503.x.
    1. Zhang CH. Nearly unbiased variable selection under minimax concave penalty. Ann Stat. 2010;38(2):894–942. doi: 10.1214/09-AOS729.
    1. Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29. doi: 10.1198/016214506000000735.
    1. Efron B, Hastie T, Johnstone I, Tibshirani R, et al. Least angle regression. Ann Stat. 2004;32(2):407–99. doi: 10.1214/009053604000000067.
    1. Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. Second International Symposium on Information Theory. Budapest: Akademiai Kiado; 1973.
    1. Schwarz G, et al. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–4. doi: 10.1214/aos/1176344136.
    1. Wahba G, Craven P. Smoothing noisy data with spline functions. estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik. 1978;31:377–404. doi: 10.1007/BF01404567.
    1. Wang H, Li R, Tsai C-L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94(3):553–68. doi: 10.1093/biomet/asm053.
    1. Meinshausen N, Bühlmann P. Stability selection. J R Stat Soc Series B (Stat Methodol) 2010;72(4):417–73. doi: 10.1111/j.1467-9868.2010.00740.x.
    1. Ye J, Farnum M, Yang E, Verbeeck R, Lobanov V, Raghavan N, Novak G, DiBernardo A, Narayan VA. Sparse learning and stability selection for predicting mci to ad conversion using baseline adni data. BMC Neurol. 2012;12(1):1. doi: 10.1186/1471-2377-12-46.
    1. Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc: Series B (Stat Methodol) 2008;70(5):849–911. doi: 10.1111/j.1467-9868.2008.00674.x.
    1. Knight K, Fu W. Asymptotics for lasso-type estimators. Ann Stat. 2000;28(5):1356–78. doi: 10.1214/aos/1015957397.
    1. Chatterjee A, Lahiri SN. Bootstrapping lasso estimators. J Am Stat Assoc. 2011;106(494):608–25. doi: 10.1198/jasa.2011.tm10159.
    1. Wasserman L, Roeder K. High dimensional variable selection. Ann Stat. 2009;37(5A):2178. doi: 10.1214/08-AOS646.
    1. Meinshausen N, Meier L, Bühlmann P. P-values for high-dimensional regression. J Am Stat Assoc. 2012;104(488):1671–81. doi: 10.1198/jasa.2009.tm08647.
    1. Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. Ann Stat. 2014;42(2):413. doi: 10.1214/13-AOS1175.
    1. Buhlmann MLP, van de Geer S. Discussion of “a significance test for the lasso”. Ann Stat. 2014;42:469–77. doi: 10.1214/13-AOS1175A.

Source: PubMed

3
Předplatit