Reducing bias through directed acyclic graphs

Ian Shrier, Robert W Platt, Ian Shrier, Robert W Platt

Abstract

Background: The objective of most biomedical research is to determine an unbiased estimate of effect for an exposure on an outcome, i.e. to make causal inferences about the exposure. Recent developments in epidemiology have shown that traditional methods of identifying confounding and adjusting for confounding may be inadequate.

Discussion: The traditional methods of adjusting for "potential confounders" may introduce conditional associations and bias rather than minimize it. Although previous published articles have discussed the role of the causal directed acyclic graph approach (DAGs) with respect to confounding, many clinical problems require complicated DAGs and therefore investigators may continue to use traditional practices because they do not have the tools necessary to properly use the DAG approach. The purpose of this manuscript is to demonstrate a simple 6-step approach to the use of DAGs, and also to explain why the method works from a conceptual point of view.

Summary: Using the simple 6-step DAG approach to confounding and selection bias discussed is likely to reduce the degree of bias for the effect estimate in the chosen statistical model.

Figures

Figure 1
Figure 1
The bi-directional arrows in A show the traditional representation of a confounder as being associated with the exposure (X) and outcome. Because confounders must cause (or be a marker for a cause) of both exposure and outcome (see text for rationale based on basic principles), directed acyclic graphs use only unidirectional arrows to show the direction of causation (B).
Figure 2
Figure 2
a-b. Diagrammatic equivalent of the 6-step process to determine if one obtains an unbiased estimate of the exposure of interest (X) on the Outcome by including a particular subset of covariates (see text for details of the specific steps). In this example, we are interested in minimizing the bias when estimating the causal effect of warming up on the risk of injury. In figure 2a, a possible causal diagram of variables that are associated with warming up (X) and injury (outcome) are shown. The main mediating variable is believed to be proprioception (balance and muscle-contraction coordination) during the game. Starting at the top of the figure, the coach affects the team motivation (including aggressiveness), which affects both the probability of previous injury and the player's compliance with warm-up exercises. A player's genetics affects their fitness level (along with the coach's fitness program) and whether there are any inherent connective tissue disorders (which leads to tissue weakness and injury). Both connective tissue disorders and fitness level affect neuromuscular fatigue, which independently affects proprioception during the game and the probability of injury. Finally, if the sport is a contact sport, the probability of previous injury is greater, as is the probability of minor bruises during the game that would affect proprioception. Although other causal models are also possible, we will use this one for illustrative purposes at this time. For this example, we have decided to include neuromuscular fatigue (Z1) and tissue weakness (Z2) in the statistical model. Step #1 is to ensure that these covariates are not descendants of (i.e. directly or indirectly caused by) warm-up exercises. Step 2 is illustrated in 2b. The open circle (previous injury, Z3) represents the only non-ancestor (an ancestor is direct or indirect cause of another variable) of warm up exercises (X), neuromuscular fatigue (Z1), tissue weakness (Z2) and injury (Outcome). It is therefore deleted from the causal diagram in figure 2b.
Figure 3
Figure 3
a-b. In Step 3 (3a), all arrows emanating from X are deleted. In Step 4 (3b), one joins all parents of a common child. We have used dashed lines here for clarity.
Figure 4
Figure 4
a-b. In Step 5 (4a), we strip all the arrowheads off all the lines. In Step 6 (4b), all lines touching the covariates neuromuscular fatigue (Z1) and tissue weakness (Z2) are deleted. Because the exposure of interest (warm up exercises) is dissociated from the Outcome (injury) after Step 6, the statistical model that includes the covariates neuromuscular fatigue and tissue weakness minimizes the potential bias for the estimate of effect of warm up exercises on the risk of injury.
Figure 5
Figure 5
a-c. This example illustrates the effect of adding the covariate "previous injury" (Z3) to the statistical model used for the causal diagram in Figure 2a. Note that previous injury is associated with both warming up (through team motivation/aggression) and the outcome injury (through Contact Sport). After completing steps 1–4, one is left with figure 5b. Because previous injury (Z3) is included in the model, it has not been deleted from the causal diagram in Step 2, and one must join its ancestors (dotted line). Figure 5c represents the causal diagram after completing Steps 5–6. Because warm up is not dissociated from the outcome risk of injury in figure 5c, the statistical model that includes the covariates Z1, Z2, and Z3 will yield a biased estimate of warm up on the risk of injury.
Figure 6
Figure 6
a-b. Figure 6a is an example of an alternative causal diagram to figure 2a. The only difference between the two is an additional causal relationship where previous injury causes a decrease in pre-game proprioception (we have also included the additional conditional associations that occur as a result of this change with dotted lines). We are still interested in the causal effects of warm-up on injury risk. Because previous injury is an ancestor of warm up exercises (previous injury causes a decrease in pre-game proprioception which causes an increase in warm up exercises), it is not deleted in Step 2. This leads to two effects. First, contact sport is now a common cause of exposure and outcome. Second, there are additional conditional associations in Step 4 (dotted lines) even if "Previous Injury" is not conditioned on in the statistical model because one is already conditioning on a descendant of previous injury (i.e. the main exposure of interest, warm-up); the effect estimate of warm-up on injury is biased if the statistical model includes only warm-up, neuromuscular fatigue and tissue weakness. Figure 6b shows the same causal diagram as 6a (without the conditional associations), but now a causal link is added from pre-game proprioception to intra-game proprioception.
Figure 7
Figure 7
a-b. Figure 7a represents the causal diagram in Figure 6b after step 5 (dark dotted line represents the additional conditional association due to the new causal link in figure 6b), and Figure 7b shows the result after step 6 if one conditions on Tissue Weakness, Neuromuscular Fatigue, Previous Injury and Contact Sport. The presence of a path through the variables Warm-up Exercise, Pre-game proprioception (directly, or indirectly through Team Motivation/Aggression) and Intra-game proprioception to Injury means that we would still obtain a biased estimate for the causal effect of warm-up on the risk of injury.

References

    1. Rothman KJ, Greenland S. Causation and causal inference. In: Rothman KJ, Greenland S, editor. Modern Epidemiology. Vol. 2. Philadelphia: Lippencott-Raven Publishers; 1998. pp. 7–28.
    1. Hernan MA. A definition of causal effect for epidemiological research. J Epidemiol Community Health. 2004;58:265–271. doi: 10.1136/jech.2002.006361.
    1. Greenland S, Morgenstern H. Confounding in health research. Annu Rev Public Health. 2001;22:189–212. doi: 10.1146/annurev.publhealth.22.1.189.
    1. Rothman KJ, Greenland S. Precision and validity in epidemiologic studies. In: Rothman KJ, Greenland S, editor. Modern Epidemiology. Vol. 2. Philadelphia: Lippencott-Raven Publishers; 1998. pp. 115–134.
    1. Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43.
    1. Glymour MM, Greenland S. Causal Diagrams. In: Rothman KJ, Greenland S, editor. Modern Epidemiology. Vol. 3. Philadelphia: Lippencott-Raven Publishers; 2008. pp. 183–209.
    1. Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;14:300–306. doi: 10.1097/00001648-200305000-00009.
    1. Weinberg CR. Toward a clearer definition of confounding. Am J Epidemiol. 1993;137:1–8.
    1. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48. doi: 10.1097/00001648-199901000-00008.
    1. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570. doi: 10.1097/00001648-200009000-00012.
    1. Hernández-Díaz S, Schisterman EF, Hernán MA. The birth weight "paradox" uncovered? Am J Epidemiol. 2006;164:1115–1120. doi: 10.1093/aje/kwj275.
    1. Pearl J. Causality: models, reasoning and inference. Cambridge University of Cambridge; 2000. Simpson's paradox, confounding, and collapibility; pp. 173–200.
    1. Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155:176–184. doi: 10.1093/aje/155.2.176.
    1. Pearl J. Causality: models, reasoning and inference. Cambridge University of Cambridge; 2000. The art and science of cause and effect; pp. 331–358.
    1. Pearl J. Causality: models, reasoning and inference. Cambridge University of Cambridge; 2000.
    1. Holland PW. Statistics and causal inference. J Amer Statist Assoc. 1986;81:945–960. doi: 10.2307/2289064.
    1. Spirtes P, Glymour C, Scheines R. Causation, prediction and search. Cambridge: MIT Press; 2000. Causation and prediction: axioms and explications; pp. 19–58.
    1. Greenland S, Rothman KJ. Introduction to stratified analysis. In: Rothman KJ, Greenland S, editor. Modern Epidemiology. Vol. 2. Philadelphia: Lippencott-Raven Publishers; 1998. pp. 253–279.
    1. Robins JM. The control of confounding by intermediate variables. Stats Med. 1989;8:679–701. doi: 10.1002/sim.4780080608.
    1. Pearl J. Causality: models, reasoning and inference. Cambridge University of Cambridge; 2000. Introduction to probabilities, graphs, and causal models; pp. 1–40.
    1. Spirtes P, Glymour C, Scheines R. Causation, prediction and search. Cambridge: MIT Press; 2000. Discovery algorithms for causally sufficient structures; pp. 73–122.
    1. Spirtes P, Glymour C, Scheines R. Causation, prediction and search. Cambridge: MIT Press; 2000. Discovery algorithms without causal sufficiency; pp. 123–155.
    1. Weinberg CR. Can DAGs clarify effect modification? Epidemiology. 2007;18:569–572. doi: 10.1097/EDE.0b013e318126c11d.
    1. VanderWeele TJ, Robins JM. Four types of effect modification: a classification based on directed acyclic graphs. Epidemiology. 2007;18:561–568. doi: 10.1097/EDE.0b013e318127181b.
    1. Vanderweele TJ, Robins JM. Directed Acyclic Graphs, Sufficient Causes, and the Properties of Conditioning on a Common Effect. Am J Epid. 2007;166:1096–1104. doi: 10.1093/aje/kwm179.
    1. Kaufman JS, Maclehose RF, Kaufman S. A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation. Epidemiol Perspect Innov. 2004;1
    1. Cole SR, Hernan MA. Fallibility in estimating direct effects. Int J Epidemiol. 2002;31:163–165. doi: 10.1093/ije/31.1.163.
    1. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560. doi: 10.1097/00001648-200009000-00011.
    1. Haight T, Tager I, Sternfeld B, Satariano W, Laan M van der. Effects of body composition and leisure-time physical activity on transitions in physical functioning in the elderly. Am J Epidemiol. 2005;162:607–617. doi: 10.1093/aje/kwi254.
    1. Witteman JC, D'Agostino RB, Stijnen T, Kannel WB, Cobb JC, de Ridder MA, Hofman A, Robins JM. G-estimation of causal effects: isolated systolic hypertension and cardiovascular death in the Framingham Heart Study. Am J Epidemiol. 1998;148:390–401.

Source: PubMed

3
Tilaa