The impact of correlated exposures and missing data on multiple informant models used to identify critical exposure windows

Jemar R Bather, Nicholas J Horton, Brent A Coull, Paige L Williams, Jemar R Bather, Nicholas J Horton, Brent A Coull, Paige L Williams

Abstract

There has been heightened interest in identifying critical windows of exposure for adverse health outcomes; that is, time points during which exposures have the greatest impact on a person's health. Multiple informant models implemented using generalized estimating equations (MIM GEEs) have been applied to address this research question because they enable statistical comparisons of differences in associations across exposure windows. As interest rises in using MIMs, the feasibility and appropriateness of their application under settings of correlated exposures and partially missing exposure measurements requires further examination. We evaluated the impact of correlation between exposure measurements and missing exposure data on the power and differences in association estimated by the MIM GEE and an inverse probability weighted extension to account for informatively missing exposures. We assessed these operating characteristics under a variety of correlation structures, sample sizes, and missing data mechanisms considering various exposure-outcome scenarios. We showed that applying MIM GEEs maintains higher power when there is a single critical window of exposure and exposure measures are not highly correlated, but may result in low power and bias under other settings. We applied these methods to a study of pregnant women living with HIV to explore differences in association between trimester-specific viral load and infant neurodevelopment.

Keywords: critical windows; exposure timing; generalized estimating equations; inverse probability weights; missing data; multiple informants.

Conflict of interest statement

CONFLICT OF INTEREST

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

© 2023 John Wiley & Sons Ltd.

Figures

FIGURE 1
FIGURE 1
Type 1 error rate (associations across trimesters were equal). The green solid line represents a 5% rejection rate (Type 1 error). Both the multiple informant model using generalized estimating equations (MIM GEE) and the multiple informant model using inverse probability-weighted generalized estimating equations (MIM IPW GEE) were analyzed across all correlation structures, sample sizes, and missing data mechanisms. Missing data mechanisms included: no missing exposure data, exposure measures missing completely at random (MCAR), exposures MAR with no correction, and exposures MAR with model correction (IPW)
FIGURE 2
FIGURE 2
Empirical power analysis under the scenario of different exposure-outcome associations across windows (Scenario 2). The empirical power of the overall test of differences in associations from both the multiple informant model using generalized estimating equations (MIM GEE) and the multiple informant model using inverse probability-weighted generalized estimating equations (MIM IPW GEE) for the second scenario considered, across correlation structures, sample sizes, and missing data mechanisms. These simulations were based on the scenario where exposure measurements had a different association with the outcome across time windows. The conditional association with the outcome was a 1 SD decrease for the first time window, 0.8 SD decrease for the second, and 0.6 SD decrease for the third. Missing data mechanisms included: no missing exposure data, exposure measures missing completely at random (MCAR), exposures MAR with no correction, and exposures MAR with model correction (IPW)
FIGURE 3
FIGURE 3
Average difference (A) and average squared difference (B) under the scenario of different exposure-outcome associations across windows (Scenario 2). The average difference and average squared difference of the period-specific estimates from both the multiple informant model using generalized estimating equations (MIM GEE) and the multiple informant model using inverse probability-weighted generalized estimating equations (MIM IPW GEE) for the second scenario considered with N = 1500, across correlation structures and missing data mechanisms. These simulations were based on the scenario where exposure measurements had a different association with the outcome across time windows. The conditional association with the outcome was a 1 SD decrease for the first time window, 0.8 SD decrease for the second, and 0.6 SD decrease for the third. Missing data mechanisms included: no missing exposure data, exposure measures missing completely at random (MCAR), exposures MAR with no correction, and exposures MAR with model correction (IPW)
FIGURE 4
FIGURE 4
Empirical power analysis under the scenario of a single critical window (Scenario 3). The empirical power of the overall test of differences in associations from both the multiple informant model using generalized estimating equations (MIM GEE) and the multiple informant model using inverse probability-weighted generalized estimating equations (MIM IPW GEE) for the third scenario considered, across correlation structures, sample sizes, and missing data mechanisms. These simulations were based on the scenario where a single critical window had an association with the outcome. We simulated this to be a conditional associations of a 0.4 SD decrease in the mean outcome for this critical window. Missing data mechanisms included: no missing exposure data, exposure measures missing completely at random (MCAR), exposures MAR with no correction, and exposures MAR with model correction (IPW)
FIGURE 5
FIGURE 5
Average difference (A) and average squared difference (B) under the scenario of a single critical window (Scenario 3). The average difference and average squared difference of the period-specific estimates from both the multiple informant model using generalized estimating equations (MIM GEE) and the multiple informant model using inverse probability-weighted generalized estimating equations (MIM IPW GEE) for the third scenario considered with N = 1500, across correlation structures and missing data mechanisms. These simulations were based on the scenario where a single critical window had an association with the outcome. We simulated this to be a conditional association of a 0.4 SD decrease in the mean outcome for this critical window. Missing data mechanisms included: no missing exposure data, exposure measures missing completely at random (MCAR), exposures MAR with no correction, and exposures MAR with model correction (IPW)

References

    1. Buckley JP, Hamra GB, Braun JM. Statistical approaches for investigating periods of susceptibility in children’s environmental health research. Curr Environ Health Rep. 2019;6(1):1–7. doi:10.1007/s40572-019-0224-5
    1. Barr M Jr, DeSesso JM, Lau CS, et al. Workshop to identify critical windows of exposure for children’s health: cardiovascular and endocrine work group summary. Environ Health Perspect. 2000;108(suppl 3):569–571. doi:10.1289/ehp.00108s3569
    1. Selevan SG, Kimmel CA, Mendola P. Identifying critical windows of exposure for children’s health. Environ Health Perspect. 2000;108(suppl 3):451–455. doi:10.1289/ehp.00108s3451
    1. Sánchez BN, Hu H, Litman HJ, Téllez-Rojo MM. Statistical methods to study timing of vulnerability with sparsely sampled data on environmental toxicants. Environ Health Perspect. 2011;119(3):409–415. doi:10.1289/ehp.1002453
    1. National Institute of Environmental Health Sciences. Advancing science, improving health: a plan for environmental health research. 2012–2017 strategic plan; 2012.
    1. National Institute of Environmental Health Sciences. Advancing environmental health sciences improving health. 2018–2023 strategic plan; 2018.
    1. Zhang Y, Mustieles V, Williams PL, et al. Prenatal urinary concentrations of phenols and risk of preterm birth: exploring windows of vulnerability. Fertil Steril. 2021;116(3):820–832. doi:10.1016/j.fertnstert.2021.03.053
    1. Jackson-Browne MS, Papandonatos GD, Chen A, et al. Identifying vulnerable periods of neurotoxicity to triclosan exposure in children. Environ Health Perspect. 2018;126(5):057001. doi:10.1289/EHP2777
    1. Brummel SS, Van Dyke RB, Patel K, et al. Analyzing longitudinally collected viral load measurements in youth with perinatally acquired HIV infection: problems and possible remedies. Am J Epidemiol. 2022;191(10):1820–1830. doi:10.1093/aje/kwac125
    1. Bather JR, Williams PL, Broadwell C, et al. Racial/ethnic disparities in longitudinal emotional–Behavioral functioning among youth born to women living with HIV. J Acquir Immune Defic Syndr. 2021;87(3):889–898. doi:10.1097/QAI.0000000000002665
    1. Yee LM, Kacanek D, Brightwell C, et al. Marijuana, opioid, and alcohol use among pregnant and postpartum individuals living with HIV in the US. JAMA Netw Open. 2021;4(12):e2137162. doi:10.1001/jamanetworkopen.2021.37162
    1. Lemon TL, Tassiopoulos K, Tsai AC, et al. Health insurance coverage, clinical outcomes, and health-related quality of life among youth born to women living with HIV. J Acquir Immune Defic Syndr. 2022;92:6–16.
    1. Gasparrini A, Armstrong B, Kenward MG. Distributed lag non-linear models. Stat Med. 2010;29(21):2224–2234. doi:10.1002/sim.3940
    1. Wilson A, Chiu YHM, Hsu HHL, Wright RO, Wright RJ, Coull BA. Potential for bias when estimating critical windows for air pollution in children’s health. Am J Epidemiol. 2017;186(11):1281–1289. doi:10.1093/aje/kwx184
    1. Wilson A, Hsu HHL, Chiu YHM, Wright RO, Wright RJ, Coull BA. Kernel machine and distributed lag models for assessing windows of susceptibility to environmental mixtures in children’s health studies. Ann Appl Stat. 2022;16(2):1090–1110. doi:10.1214/21-aoas1533
    1. Warren JL, Kong W, Luben TJ, Chang HH. Critical window variable selection: estimating the impact of air pollution on very preterm birth. Biostatistics. 2020;21(4):790–806. doi:10.1093/biostatistics/kxz006
    1. Warren JL, Chang HH, Warren LK, Strickland MJ, Darrow LA, Mulholland JA. Critical window variable selection for mixtures: estimating the impact of multiple air pollutants on stillbirth. Ann Appl Stat. 2022;16(3):1633–1652.
    1. Bello GA, Arora M, Austin C, Horton MK, Wright RO, Gennings C. Extending the distributed lag model framework to handle chemical mixtures. Environ Res. 2017;156:253–264. doi:10.1016/j.envres.2017.03.031
    1. Antonelli J, Wilson A, Coull B. Multiple exposure distributed lag models with variable selection. arXiv preprint arXiv:2107.14567 2021.
    1. Horton NJ, Laird NM, Zahner GE. Use of multiple informant data as a predictor in psychiatric epidemiology. Int J Methods Psychiatr Res. 1999;8(1):6–18. doi:10.1002/mpr.52
    1. Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Stat Med. 1999;18(2):163–173. doi:10.1002/(SICI)1097-0258(19990130)18:2<163::AID-SIM11>;2-F
    1. Horton NJ, Laird NM, Murphy JM, Monson RR, Sobol AM, Leighton AH. Multiple informants: mortality associated with psychiatric disorders in the Stirling County study. Am J Epidemiol. 2001;154(7):649–656. doi:10.1093/aje/154.7.649
    1. Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Stat Med. 2004;23(18):2911–2933. doi:10.1002/sim.1879
    1. Litman HJ, Horton NJ, Murphy JM, Laird NM. Marginal regression models with a time to event outcome and discrete multiple source predictors. Lifetime Data Anal. 2006;12(3):249–265. doi:10.1007/s10985-006-9013-1
    1. Litman HJ, Horton NJ, Hernández B, Laird NM. Incorporating missingness for estimation of marginal regression models with multiple source predictors. Stat Med. 2007;26(5):1055–1068. doi:10.1002/sim.2593
    1. Horton NJ, Roberts K, Ryan L, Suglia SF, Wright RJ. A maximum likelihood latent variable regression model for multiple informants. Stat Med. 2008;27(24):4992–5004. doi:10.1002/sim.3324
    1. Diggle PJ, Heagerty P, Liang KY, Zeger S. Analysis of Longitudinal Data. New York, NY: Oxford University Press; 2002.
    1. Little RJ, Rubin DB. Statistical Analysis with Missing Data. Vol 793. Hoboken, NJ: John Wiley & Sons; 2019.
    1. O’Brien LM, Fitzmaurice GM, Horton NJ. Maximum likelihood estimation of marginal pairwise associations with multiple source predictors. Biom J. 2006;48(5):860–875. doi:10.1002/bimj.200510227
    1. Zhao LP, Lipsitz S, Lew D. Regression analysis with missing covariate data using estimating equations. Biometrics. 1996;52:1165–1182. doi:10.2307/2532833
    1. Xie F, Paik MC. Generalized estimating equation model for binary outcomes with missing covariates. Biometrics. 1997;53:1458–1466. doi:10.2307/2533511
    1. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89(427):846–866.
    1. Litman HJ, Horton NJ, Hernández B, Laird NM. 25 Estimation of marginal regression models with multiple source predictors. In: Rao CR, Miller JP, Rao DC, eds. Handbook of Statistics. Vol 27. Amsterdam, The Netherlands: Elsevier Science & Technology; 2007:730–746.
    1. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. doi:10.1093/biomet/73.1.13
    1. Huber PJ. The Behavior of Maximum Likelihood Estimates under Nonstandard Conditions. Berkeley, CA: University of California Press; 1967:221–233.
    1. Sun B, Tchetgen Tchetgen EJ. On inverse probability weighting for nonmonotone missing at random data. J Am Stat Assoc. 2018;113(521):369–379. doi:10.1080/01621459.2016.1256814
    1. Fitzmaurice GM, Laird NM, Zahner GE, Daskalakis C. Bivariate logistic regression analysis of childhood psychopathology ratings using multiple informants. Am J Epidemiol. 1995;142(11):1194–1203. doi:10.1093/oxfordjournals.aje.a117578
    1. Lipsitz SR, Ibrahim JG, ZhaoLP. A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J Am Stat Assoc. 1999;94(448):1147–1160. doi:10.1080/01621459.1999.10473870
    1. Jiang W, Song S, Hou L, Zhao H. A set of efficient methods to generate high-dimensional binary data with specified correlation structures. Am Stat. 2021;75(3):310–322. doi:10.1080/00031305.2020.1816213
    1. Williams PL, Seage GR III, Van Dyke RB, et al. A trigger-based design for evaluating the safety of in utero antiretroviral exposure in uninfected children of human immunodeficiency virus-infected mothers. Am J Epidemiol. 2012;175(9):950–961. doi:10.1093/aje/kwr401
    1. Sirois PA, Huo Y, Williams PL, et al. Safety of perinatal exposure to antiretroviral medications: developmental outcomes in infants. Pediatr Infect Dis J. 2013;32(6):648. doi:10.1097/INF.0b013e318284129a
    1. Van Dyke RB, Chadwick EG, Hazra R, Williams PL, Seage GR III. The PHACS SMARTT study: assessment of the safety of in utero exposure to antiretroviral drugs. Front Immunol. 2016;7:199.
    1. SAS Institute Inc. Generate multivariate binary data with given means and correlation matrix. . 2020. Accessed December 20, 2021.
    1. Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988;44:1033–1048. doi:10.2307/2531733
    1. Yoon FB, Fitzmaurice GM, Lipsitz SR, Horton NJ, Laird NM, Normand SLT. Alternative methods for testing treatment effects on the basis of multiple outcomes: simulation and case study. Stat Med. 2011;30(16):1917–1932. doi:10.1002/sim.4262
    1. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol. 1992;136(11):1400–1413. doi:10.1093/oxfordjournals.aje.a116453
    1. Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika. 1974;61(2):215–231. doi:10.1093/biomet/61.2.215
    1. Formann AK, Kohlmann T. Latent class analysis in medical research. Stat Methods Med Res. 1996;5(2):179–211. doi:10.1177/096228029600500205
    1. Rabe-Hesketh S, Yang S, Pickles A. Multilevel models for censored and latent responses. Stat Methods Med Res. 2001;10(6):409–427. doi:10.1177/096228020101000604
    1. Bollen KA. Structural Equations with Latent Variables. Hoboken, NJ: John Wiley & Sons; 2014. doi:10.1007/s11336-013-9335-3
    1. Hoyle RH. Structural Equation Modeling: Concepts, Issues, and Applications. Thousand Oaks, CA: SAGE Publications; 1995.

Source: PubMed

3
Suscribir