An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies

Peter C Austin, Peter C Austin

Abstract

The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.

References

    1. Austin P.C. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: A systematic review and suggestions for improvement. Journal of Thoracic and Cardiovascular Surgery. 2007a;134:1128–1135. doi:10.1016/j.jtcvs.2007.07.021.
    1. Austin P.C. The performance of different propensity score methods for estimating marginal odds ratios. Statistics in Medicine. 2007b;26:3078–3094. doi:10.1002/sim.2781.
    1. Austin P.C. The performance of different propensity score methods for estimating relative risks. Journal of Clinical Epidemiology. 2008a;61:537–545. doi:10.1016/j.jclinepi.2007.07.011.
    1. Austin P.C. A critical appraisal of propensity score matching in the medical literature from 1996 to 2003. Statistics in Medicine. 2008b;27:2037–2049. doi:10.1002/sim.3150.
    1. Austin P.C. A report card on propensity-score matching in the cardiology literature from 2004 to 2006: Results of a systematic review. Circulation: Cardiovascular Quality and Outcomes. 2008c;1:62–67. doi:10.1161/CIRCOUTCOMES.108.790634.
    1. Austin P.C. Assessing balance in baseline covariates when using many-to-one matching on the propensity-score. Pharmacoepidemiology and Drug Safety. 2008d;17:1218–1225. doi:10.1002/ pds.1674.
    1. Austin P.C. Goodness-of-fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score. Pharmacoepidemiology and Drug Safety. 2008e;17:1202–1217. doi:10.1002/pds.1673.
    1. Austin P.C. Type I error rates, coverage of confidence intervals, and variance estimation in propensity-score matched analyses. The International Journal of Biostatistics. 2009a;5 Article 13. doi:10.2202/1557-4679.1146.
    1. Austin P.C. The relative ability of different propensity-score methods to balance measured covariates between treated and untreated subjects in observational studies. Medical Decision Making. 2009b;29:661–677. doi:10.1177/0272989X09341755.
    1. Austin P.C. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine. 2009c;28:3083–3107. doi:10.1002/sim.3697.
    1. Austin P.C. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Communications in Statistics— Simulation and Computation. 2009d;38:1228–1234. .doi:10.1080/03610910902859574.
    1. Austin P.C. The performance of different propensity score methods for estimating difference in proportions (risk differences or absolute risk reductions) in observational studies. Statistics in Medicine. 2010;29:2137–2148. doi:10.1002/sim.3854.
    1. Austin P.C. A tutorial and case study in propensity score analysis: An application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate Behavioral Research. 2011a;46:119–151. doi:10.1080/00273171.2011.540480.
    1. Austin P.C. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics. 2011b;10:150–161. doi:10.1002/pst.433.
    1. Austin P.C. Comparing paired vs. non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples, Statistics in Medicine. in press. doi:10.1002/sim.4200.
    1. Austin P.C., Grootendorst P., Anderson G.M. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Statistics in Medicine. 2007;26:734–753. doi:10.1002/sim.2580.
    1. Austin P.C., Grootendorst P., Normand S.L. T., Anderson G.M. Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: A Monte Carlo study. Statistics in Medicine. 2007;26:754–768. doi:10.1002/sim.2618.
    1. Austin P.C., Mamdani M.M. A comparison of propensity score methods: A case-study estimating the effectiveness of post-AMI statin use. Statistics in Medicine. 2006;25:2084–2106. doi: 10.1002/sim.2328.
    1. Austin P.C., Mamdani M.M., Stukel T.A., Anderson G.M., Tu J.V. The use of the propensity score for estimating treatment effects: Administrative versus clinical data. Statistics in Medicine. 2005;24:1563–1578. doi:10.1002/sim.2053.
    1. Braitman L.E., Rosenbaum P.R. Rare outcomes, common treatments: Analytic strategies using propensity scores. Annals of Internal Medicine. 2002;137:693–695.
    1. Brookhart M.A., Schneeweiss S., Rothman K.J., Glynn R.J., Avorn J., Stürmer T. Variable selection for propensity score models. American Journal of Epidemiology. 2006;163:1149–1156.
    1. Cochran W.G. The planning of observational studies of human populations (with discussion) Journal of the Royal Statistical Society, Series A. 1965;128:134–155.
    1. Cochran W.G. The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics. 1968;24:295–313.
    1. Cochran W.G., Rubin D.B. Controlling bias in observational studies: A review. The Indian Journal of Statistics, Series A. 1973;35:417–466.
    1. D'Agostino R.B., Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine. 1998;17:2265–2281.
    1. Faries D.E., Leon A.C., Maria Haro J., Obenchain R.L. Analysis of observational health care data using SAS®. Cary, NC: SAS Institute Inc.; 2010.
    1. Flury B.K., Riedwyl H. Standard distance in univariate and multivariate analysis. The American Statistician. 1986;40:249–251.
    1. Gail M.H., Wieand S., Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika. 1984;7:431–444.
    1. Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. American Journal of Epidemiology. 1987;125:761–768.
    1. Greenland S., Pearl J., Robins J.M. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48.
    1. Gu X.S., Rosenbaum P.R. Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational and Graphical Statistics. 1993;2:405–420.
    1. Guo S., Fraser M.W. Propensity score analysis: Statistical methods and applications. Thousand Oaks, CA: Sage; 2009.
    1. Hansen B.B. Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association. 2004;99:609–618.
    1. Hansen B.B., Klopfer S.O. Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics. 2006;15:609–627.
    1. Hernan M.A., Brumback B., Robins J.M. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570.
    1. Hernan M.A., Brumback B., Robins J.M. Estimating the causal effect of zidovudine on CD4 count with a marginal structural model for repeated measures. Statistics in Medicine. 2002;21:1689–1709.
    1. Hill J., Reiter J.P. Interval estimation for treatment effects using propensity score matching. Statistics in Medicine. 2006;25:2230–2256.
    1. Ho D.E., Imai K., King G., Stuart E.A. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis. 2007;15:199–236.
    1. Ho D.E., Imai K., King G., Stuart E.A. MatchIt: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software. 2011;42(8)
    1. Hong J., Yu B. Effects of kindergarten retention on children's social-emotional development: An application of propensity score method to multivariate, multilevel data. Developmental Psychology. 2008;44:407–421.
    1. Huppler Hullsiek K., Louis T.A. Propensity score modeling strategies for the causal analysis of observational data. Biostatistics. 2002;3:179–193.
    1. Imai K., King G., Stuart E.A. Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A. 2008;171:481–501.
    1. Imbens G.W. Nonparametric estimation of average treatment effects under exogeneity: A review. The Review of Economics and Statistics. 2004;86:4–29.
    1. Joffe M.M., Ten Have T.R., Feldman H.I., Kimmel S.E. Model selection, confounder control, and marginal structural models: Review and new applications. The American Statistician. 2004;58:272–279.
    1. Kurth T., Walker A.M., Glynn R.J., Chan K.A., Gaziano J.M., Berger K., Robins J.M. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. American Journal of Epidemiology. 2006;163:262–270.
    1. Lee B.K., Lessler J., Stuart E.A. Improving propensity score weighting using machine learning. Statistics in Medicine. 2010;29:337–346.
    1. Luellen J.K., Shadish W.R., Clark M.H. Propensity scores: An introduction and experimental test. Evaluation Review. 2005;29:530–558.
    1. Lunceford J.K., Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine. 2004;23:2937–2960.
    1. McCaffrey D.F., Ridgeway G., Morral A.R. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods. 2004;9:403–425.
    1. Ming K., Rosenbaum P.R. Substantial gains in bias reduction from matching with a variable number of controls. Biometrics. 2000;56:118–124.
    1. Morgan S.L., Todd J.L. A diagnostic routine for the detection of consequential heterogeneity of causal effects. Sociological Methodology. 2008;38:231–281.
    1. Morgan S.L., Winship C. Counterfactuals and causal inference: Methods and principles for social research. New York, NY: Cambridge University Press; 2007.
    1. Normand S.L. T., Landrum M.B., Guadagnoli E., Ayanian J.Z., Ryan T.J., Cleary P.D., McNeil B.J. Validating recommendations for coronary angiography following an acute myocardial infarction in the elderly: A matched analysis using propensity scores. Journal of Clinical Epidemiology. 2001;54:387–398.
    1. Peduzzi P., Concato J., Feinstein A.R., Holford T.R. Importance of events per independent variable in proportional hazards regression analysis: II. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology. 1995;48:1503–1510.
    1. Peduzzi P., Concato J., Kemper E., Holford T.R., Feinstein A.R. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology. 1996;49:1373–1379.
    1. Robins J.M., Hernan M.A., Brumback B. Marginal structural models and causal inference in Epidemiology. Epidemiology. 2000;11:550–560.
    1. Rosenbaum P.R. Model-based direct adjustment. The Journal of the American Statistician. 1987a;82:387–394.
    1. Rosenbaum P.R. The role of a second control group in an observational study. Statistical Science. 1987b;2:292–316.
    1. Rosenbaum P.R. A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society, Series B. 1991;53:597–610.
    1. Rosenbaum P.R. Observational studies. 2nd ed. New York, NY: Springer-Verlag; 2002.
    1. Rosenbaum P.R. Propensity score. In: Armitage P., Colton T., editors. Encyclopedia of biostatistics. 2nd ed. Boston, MA: Wiley; 2005. pp. 4267–4272.
    1. Rosenbaum P.R. Design of observational studies. New York, NY: Springer-Verlag; 2010.
    1. Rosenbaum P.R., Rubin D.B. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983a;70:41–55.
    1. Rosenbaum P.R., Rubin D.B. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B. 1983b;45:212–218.
    1. Rosenbaum P.R., Rubin D.B. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association. 1984;79:516–524.
    1. Rosenbaum P.R., Rubin D.B. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician. 1985;39:33–38.
    1. Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701.
    1. Rubin D.B. Estimating causal effects from large data sets using propensity scores. Annals of Internal Medicine. 1997;127:757–763.
    1. Rubin D.B. Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology. 2001;2:169–188.
    1. Rubin D.B. On principles for modeling propensity scores in medical research. Pharmacoepidemiology Drug Safety. 2004;13:855–857.
    1. Rubin D.B. Matched sampling for causal effects. New York, NY: Cambridge University Press; 2006.
    1. Rubin D.B. The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Statistics in Medicine. 2007;26:20–36.
    1. Rubin D.B., Thomas N. Combining propensity score matching with additional adjustments for prognostic covariates. Journal of the American Statistical Association. 2000;95:573–585.
    1. Schafer J.L., Kang J. Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods. 2008;13:279–313.
    1. Sekhon J.S. Multivariate and propensity score matching software with automated balance optimization: The Matching package for R, Journal of Statistical Software. in press.
    1. Setoguchi S., Schneeweiss S., Brookhart M.A., Glynn R.J., Cook E.F. Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety. 2008;17:546–555.
    1. Shah B.R., Laupacis A., Hux J.E., Austin P.C. Propensity score methods give similar results to traditional regression modeling in observational studies: A systematic review. Journal of Clinical Epidemiology. 2005;58:550–559.
    1. Staff J., Patrick M.E., Loken E., Maggs J.L. Teenage alcohol use and educational attainment. Journal of Studies on Alcohol and Drugs. 2008;69:848–858.
    1. Steyerberg E.W. Clinical prediction models: A practical approach to development, validation, and updating. New York, NY: Springer; 2009.
    1. Stürmer T., Joshi M., Glynn R.J., Avorn J., Rothman K.J., Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. Journal of Clinical Epidemiology. 2006;59:437–447.
    1. Weitzen S., Lapane K.L., Toledano A.Y., Hume A.L., Mor V. Weaknesses of goodness-of-fit tests for evaluating propensity score models: The case of the omitted confounder. Pharmacoepidemiolgy and Drug Safety. 2005;14:227–238.
    1. Wyse A.E., Keesler V., Schneider B. Assessing the effects of small school size on mathematics achievement: A propensity score-matching approach. Teachers College Record. 2008;110:1879–1900.
    1. Ye Y., Kaskutas L.A. Using propensity scores to adjust for selection bias when assessing the effectiveness of Alcoholics Anonymous in observational studies. Drug and Alcohol Dependence. 2009;104:56–64.

Source: PubMed

3
Abonneren