Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-to-one matching on the propensity score

Peter C Austin, Peter C Austin

Abstract

Propensity-score matching is increasingly being used to estimate the effects of treatments using observational data. In many-to-one (M:1) matching on the propensity score, M untreated subjects are matched to each treated subject using the propensity score. The authors used Monte Carlo simulations to examine the effect of the choice of M on the statistical performance of matched estimators. They considered matching 1-5 untreated subjects to each treated subject using both nearest-neighbor matching and caliper matching in 96 different scenarios. Increasing the number of untreated subjects matched to each treated subject tended to increase the bias in the estimated treatment effect; conversely, increasing the number of untreated subjects matched to each treated subject decreased the sampling variability of the estimated treatment effect. Using nearest-neighbor matching, the mean squared error of the estimated treatment effect was minimized in 67.7% of the scenarios when 1:1 matching was used. Using nearest-neighbor matching or caliper matching, the mean squared error was minimized in approximately 84% of the scenarios when, at most, 2 untreated subjects were matched to each treated subject. The authors recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using propensity-score matching.

References

    1. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
    1. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79(387):516–524.
    1. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat. 1985;39(1):33–38.
    1. Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat Med. 2006;25(12):2084–2106.
    1. Austin PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med. 2008;27(12):2037–2049.
    1. Austin PC. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J Thorac Cardiovasc Surg. 2007;134(5):1128–1135.
    1. Austin PC. Primer on statistical interpretation or methods report card on propensity-score matching in the cardiology literature from 2004 to 2006: a systematic review. Circ Cardiovasc Qual Outcomes. 2008;1(1):62–67.
    1. Boening A, Friedrich C, Hedderich J, et al. Early and medium-term results after on-pump and off-pump coronary artery surgery: a propensity score analysis. Ann Thorac Surg. 2003;76(6):2000–2006.
    1. Aronow HD, Novaro GM, Lauer MS, et al. In-hospital initiation of lipid-lowering therapy after coronary intervention as a predictor of long-term utilization: a propensity analysis. Arch Intern Med. 2003;163(21):2576–2582.
    1. Magee MJ, Jablonski KA, Stamou SC, et al. Elimination of cardiopulmonary bypass improves early survival for multivessel coronary artery bypass patients. Ann Thorac Surg. 2002;73(4):1196–1202.
    1. Sernyak MJ, Desai R, Stolar M, et al. Impact of clozapine on completed suicide. Am J Psychiatry. 2001;158(6):931–937.
    1. Chukwuemeka A, Weisel A, Maganti M, et al. Renal dysfunction in high-risk patients after on-pump and off-pump coronary artery bypass surgery: a propensity score analysis. Ann Thorac Surg. 2005;80(6):2148–2153.
    1. Reeves BC, Ascione R, Caputo M, et al. Morbidity and mortality following acute conversion from off-pump to on-pump coronary surgery. Eur J Cardiothorac Surg. 2006;29(6):941–947.
    1. Rajakaruna C, Rogers CA, Angelini GD, et al. Risk factors for and economic implications of prolonged ventilation after cardiac surgery. J Thorac Cardiovasc Surg. 2005;130(5):1270–1277.
    1. Kaw R, Golish J, Ghamande S, et al. Incremental risk of obstructive sleep apnea on cardiac surgical outcomes. J Cardiovasc Surg (Torino) 2006;47(6):683–689.
    1. Ahmed A, Perry GJ, Fleg JL, et al. Outcomes in ambulatory chronic systolic and diastolic heart failure: a propensity score analysis. Am Heart J. 2006;152(5):956–966.
    1. Stamou SC, White T, Barnett S, et al. Comparisons of cardiac surgery outcomes in Jehovah's versus non-Jehovah's Witnesses. Am J Cardiol. 2006;98(9):1223–1225.
    1. Toumpoulis IK, Anagnostopoulos CE, Katritsis DG, et al. The impact of preoperative thrombolysis on long-term survival after coronary artery bypass grafting. Circulation. 2005;112(9 suppl):1351–1357.
    1. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701.
    1. Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat. 2004;86(1):4–29.
    1. Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat Med. 2007;26(4):734–753.
    1. Austin PC, Grootendorst P, Normand SL, et al. Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: a Monte Carlo study. Stat Med. 2007;26(4):754–768.
    1. Austin PC. The performance of different propensity score methods for estimating marginal odds ratios. Stat Med. 2007;26(16):3078–3094.
    1. Austin PC. The performance of different propensity-score methods for estimating relative risks. J Clin Epidemiol. 2008;61(6):537–545.
    1. Austin PC. Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations. Biom J. 2009;51(1):171–184.
    1. Austin PC. Type I error rates, coverage of confidence intervals, and variance estimation in propensity-score matched analyses. Int J Biostat. 2009;5(1) Article 13. (doi: 10.2202/1557-4679.1146)
    1. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers; 1988.
    1. Rosenbaum PR, Rubin DB. The bias due to incomplete matching. Biometrics. 1985;41(1):103–116.
    1. Rosenbaum P. Observational Studies. New York, NY: Springer-Verlag New York; 1995.
    1. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat. [published online ahead of print April 10, 2010] (doi: 10.1002/pst.433)
    1. Flury BK, Riedwyl H. Standard distance in univariate and multivariate analysis. Am Stat. 1986;40(3):249–251.
    1. Austin PC. Assessing balance in measured baseline covariates when using many-to-one matching on the propensity-score. Pharmacoepidemiol Drug Saf. 2008;17(12):1218–1225.
    1. Casella G, Berger RL. Statistical Inference. Belmont, CA: Duxbury Press; 1990.
    1. Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975;31(3):643–649.
    1. Breslow NE, Day NE. Lyon, France: International Agency for Research on Cancer; 1980. Statistical Methods in Cancer Research. Vol. 1. The Analysis of Case-Control Studies. (IARC Scientific Publication no. 32)
    1. Ming K, Rosenbaum PR. Substantial gains in bias reduction from matching with a variable number of controls. Biometrics. 2000;56(1):118–124.
    1. Rosenbaum PR. A characterization of optimal designs for observational studies. J R Stat Soc Series B. 1991;53(3):597–610.
    1. Gu XS, Rosenbaum PR. Comparison of multivariate matching methods: structures, distances, and algorithms. J Comput Graph Stat. 1993;2(4):405–420.
    1. Hansen BB. Full matching in an observational study of coaching for the SAT. J Am Stat Assoc. 2004;99(467):609–618.

Source: PubMed

3
Prenumerera