Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies

Peter C Austin, Elizabeth A Stuart, Peter C Austin, Elizabeth A Stuart

Abstract

The propensity score is defined as a subject's probability of treatment selection, conditional on observed baseline covariates. Weighting subjects by the inverse probability of treatment received creates a synthetic sample in which treatment assignment is independent of measured baseline covariates. Inverse probability of treatment weighting (IPTW) using the propensity score allows one to obtain unbiased estimates of average treatment effects. However, these estimates are only valid if there are no residual systematic differences in observed baseline characteristics between treated and control subjects in the sample weighted by the estimated inverse probability of treatment. We report on a systematic literature review, in which we found that the use of IPTW has increased rapidly in recent years, but that in the most recent year, a majority of studies did not formally examine whether weighting balanced measured covariates between treatment groups. We then proceed to describe a suite of quantitative and qualitative methods that allow one to assess whether measured baseline covariates are balanced between treatment groups in the weighted sample. The quantitative methods use the weighted standardized difference to compare means, prevalences, higher-order moments, and interactions. The qualitative methods employ graphical methods to compare the distribution of continuous baseline covariates between treated and control subjects in the weighted sample. Finally, we illustrate the application of these methods in an empirical case study. We propose a formal set of balance diagnostics that contribute towards an evolving concept of 'best practice' when using IPTW to estimate causal treatment effects using observational data.

Keywords: IPTW; causal inference; inverse probability of treatment weighting; observational study; propensity score.

© 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Figures

Figure 1
Figure 1
Number of published IPTW studies.
Figure 2
Figure 2
Absolute standardized differences in unweighted and weighted samples.
Figure 3
Figure 3
Distribution of age between treated and control subjects.
Figure 4
Figure 4
Distribution of respiratory rate between treated and control subjects.
Figure 5
Figure 5
Distribution of creatinine between treated and control subjects.
Figure 6
Figure 6
Distribution of hemoglobin between treated and control subjects.
Figure 7
Figure 7
Distribution of log‐creatinine between treated and control subjects.

References

    1. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70:41–55.
    1. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association 1984; 79:516–524.
    1. Austin PC. An introduction to propensity‐score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research 2011; 46:399–424.
    1. Rosenbaum PR. Model‐based direct adjustment. Journal of the American Statistical Association 1987; 82:387–394.
    1. Rubin DB. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services & Outcomes Research Methodology 2001; 2:169–188.
    1. Rubin DB. On principles for modeling propensity scores in medical research. Pharmacoepidemiology and Drug Safety 2004; 13(12):855–857.
    1. Austin PC, Mamdani MM, Stukel TA, Anderson GM, Tu JV. The use of the propensity score for estimating treatment effects: administrative versus clinical data. Statistics in Medicine 2005; 24(10):1563–1578.
    1. Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Statistics in Medicine 2007; 26(4):734–753.
    1. Ho DE, Imai K, King G, Stuart EA. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 2007; 15:199–236.
    1. Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity‐score matched samples. Statistics in Medicine 2009; 28(25):3083–3107.
    1. Austin PC. Goodness‐of‐fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score. Pharmacoepidemiology and Drug Safety 2008; 17(12):1202–1217.
    1. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 1974; 66:688–701.
    1. Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Review of Economics and Statistics 2004; 86:4–29.
    1. Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine 2004; 23(19):2937–2960.
    1. Joffe MM, Ten Have TR, Feldman HI, Kimmel SE. Model selection, confounder control, and marginal structural models: review and new applications. The American Statistician 2004; 58:272–279.
    1. Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology 2008; 168(6):656–664.
    1. Lee BK, Lessler J, Stuart EA. Weight trimming and propensity score weighting. PLoS One 2011; 6(3):e18174 DOI: .
    1. Morgan SL, Todd JL. A diagnostic routine for the detection of consequential heterogeneity of causal effects. Sociological Methodology 2008; 38:231–281.
    1. Rosenbaum PR. Design of observational studies. Springer‐Verlag: New York, NY, 2010.
    1. Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, Joffe MM, Glynn RJ. Effects of adjusting for instrumental variables on bias and precision of effect estimates. American Journal of Epidemiology 2011; 174(11):1213–1222.
    1. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999; 10(1):37–48.
    1. Rubin DB. The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Statistics in Medicine 2007; 26(1):20–36.
    1. Ho JM, Gomes T, Straus SE, Austin PC, Mamdani M, Juurlink DN. Adverse cardiac events in older patients receiving venlafaxine: a population‐based study. Journal of Clinical Psychiatry 2014; 75(6):e552–e558. DOI: .
    1. Richardson K, Kenny RA, Bennett K. The effect of free health care on polypharmacy: a comparison of propensity score methods and multivariable regression to account for confounding. Pharmacoepidemiology and Drug Safety 2014; 23(6):656–665.
    1. Backus LI, Belperio PS, Shahoumian TA, Cheung R, Mole LA. Comparative effectiveness of the hepatitis C virus protease inhibitors boceprevir and telaprevir in a large U.S. cohort. Alimentary Pharmacology & Therapeutics 2014; 39(1):93–103.
    1. Alvarez‐Uria G, Midde M, Pakam R, Naik PK. Directly‐observed intermittent therapy versus unsupervised daily regimen during the intensive phase of antituberculosis therapy in HIV infected patients. BioMed Research International 2014; 2014:Article ID 937817, 7 pages. DOI: .
    1. Beadles CA, Hassmiller LK, Viera AJ, Greene SB, Brookhart MA, Weinberger M. Patient‐centered medical homes and oral anticoagulation therapy initiation. Medical Care Research and Review 2014; 71(2):174–191.
    1. Kranz AM, Rozier RG, Preisser JS, Stearns SC, Weinberger M, Lee JY. Comparing medical and dental providers of oral health services on early dental caries experience. American Journal of Public Health 2014; 104(7):e92–e99.
    1. Olszewski AJ, Ali S. Comparative outcomes of rituximab‐based systemic therapy and splenectomy in splenic marginal zone lymphoma. Annals of Hematology 2014; 93(3):449–458.
    1. Park SH, Choi SM, Chang YK, Lee DG, Cho SY, Lee HJ, Choi JH, Yoo JH. The efficacy of non‐carbapenem antibiotics for the treatment of community‐onset acute pyelonephritis due to extended‐spectrum beta‐lactamase‐producing Escherichia coli. Journal of Antimicrobial Chemotherapy 2014; 69(10):2848–2856.
    1. Westin GG, Armstrong EJ, Bang H, Yeo KK, Anderson D, Dawson DL, Pevec WC, Amsterdam EA, Laird JR. Association between statin medications and mortality, major adverse cardiovascular event, and amputation‐free survival in patients with critical limb ischemia. Journal of the American College of Cardiology 2014; 63(7):682–690.
    1. Hung CC, Yang ML, Lin MY, Lin HY, Lim LM, Kuo HT, Hwang SJ, Tsai JC, Chen HC. Dipyridamole treatment is associated with improved renal outcome and patient survival in advanced chronic kidney disease. Kaohsiung Journal of Medical Sciences 2014; 30(12):599–607.
    1. Ro SK, Kim JB, Jung SH, Choo SJ, Chung CH, Lee JW. Extracorporeal life support for cardiogenic shock: influence of concomitant intra‐aortic balloon counterpulsation. European Journal of Cardio‐Thoracic Surgery 2014; 46(2):186–192.
    1. Yoo JS, Kim JB, Jung SH, Choo SJ, Chung CH, Lee JW. Echocardiographic assessment of mitral durability in the late period following mitral valve repair: minithoracotomy versus conventional sternotomy. Journal of Thoracic and Cardiovascular Surgery 2014; 147(5):1547–1552.
    1. Yoo JS, Kim JB, Jung SH, Choo SJ, Chung CH, Lee JW. Surgical repair of descending thoracic and thoracoabdominal aortic aneurysm involving the distal arch: open proximal anastomosis under deep hypothermia versus arch clamping technique. Journal of Thoracic and Cardiovascular Surgery 2014; 148(5):2101–2107.
    1. Park SJ, Ryu MH, Ryoo BY, Park YS, Sohn BS, Kim HJ, Kim CW, Kim KH, Yu CS, Yook JH, Kim BS, Kang YK. The role of surgical resection following imatinib treatment in patients with recurrent or metastatic gastrointestinal stromal tumors: results of propensity score analyses. Annals of Surgical Oncology 2014; 21(13):4211–4217.
    1. de ML , Neuzillet C, Pozet A, Desot E, Deguelte‐Lardiere S, Volet J, Karoui M, Kianmanesh R, Bonnetain F, Bouche O. Is primary tumor resection associated with a longer survival in colon cancer and unresectable synchronous metastases? A 4‐year multicentre experience. European Journal of Surgical Oncology 2014; 40(6):685–691.
    1. Austin PC. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Communications in Statistics ‐ Simulations and Computation 2009; 38:1228–1234.
    1. Flury BK, Riedwyl H. Standard distance in univariate and multivariate analysis. The American Statistician 1986; 40:249–251.
    1. Hoaglin DC, Mosteller F, Tukey JW. Understanding robust and exploratory data analysis. John Wiley & Sons: New York, NY, 1983.
    1. Casella G, Berger RL. Statistical Inference. Duxbury Press: Belmont, CA, 1990.
    1. Sheskin DJ. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC: Boca Raton, Florida, 2004.
    1. Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society ‐ Series A (Statistics in Society) 2008; 171:481–502.
    1. Tu JV, Donovan LR, Lee DS, Wang JT, Austin PC, Alter DA, Ko DT. Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. Journal of the American Medical Association 2009; 302(21):2330–2337.
    1. Normand ST, Landrum MB, Guadagnoli E, Ayanian JZ, Ryan TJ, Cleary PD, McNeil BJ. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. Journal of Clinical Epidemiology 2001; 54(4):387–398.
    1. Mamdani M, Sykora K, Li P, Normand SL, Streiner DL, Austin PC, Rochon PA, Anderson GM. Reader's guide to critical appraisal of cohort studies: 2. Assessing potential for confounding. British Medical Journal 2005; 330(7497):960–962.
    1. Harrell FE Jr. Regression modeling strategies. Springer‐Verlag. NY: New York, 2001.
    1. Austin PC, Mamdani MM. A comparison of propensity score methods: a case‐study estimating the effectiveness of post‐AMI statin use. Statistics in Medicine 2006; 25(12):2084–2106.
    1. Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Weaknesses of goodness‐of‐fit tests for evaluating propensity score models: the case of the omitted confounder. Pharmacoepidemiology and Drug Safety 2005; 14(4):227–238.
    1. Austin PC. A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003. Statistics in Medicine 2008; 27(12):2037–2049.
    1. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV‐positive men. Epidemiology 2000; 11(5):561–570.
    1. Hernan MA, Brumback BA, Robins JM. Estimating the causal effect of zidovudine on CD4 count with a marginal structural model for repeated measures. Statistics in Medicine 2002; 21(12):1689–1709.
    1. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11(5):550–560.

Source: PubMed

3
Iratkozz fel