Alternatives to P value: confidence interval and effect size

Dong Kyu Lee, Dong Kyu Lee

Abstract

The previous articles of the Statistical Round in the Korean Journal of Anesthesiology posed a strong enquiry on the issue of null hypothesis significance testing (NHST). P values lie at the core of NHST and are used to classify all treatments into two groups: "has a significant effect" or "does not have a significant effect." NHST is frequently criticized for its misinterpretation of relationships and limitations in assessing practical importance. It has now provoked criticism for its limited use in merely separating treatments that "have a significant effect" from others that do not. Effect sizes and CIs expand the approach to statistical thinking. These attractive estimates facilitate authors and readers to discriminate between a multitude of treatment effects. Through this article, I have illustrated the concept and estimating principles of effect sizes and CIs.

Keywords: Confidence intervals; Effect sizes; P value.

Figures

Fig. 1. The changes in CI of…
Fig. 1. The changes in CI of the mean and alpha error values in accordance with sample size. All three data samples are randomly extracted using R system, under conditions of normal distribution with mean = 100, SD = 10. Each datum includes 10, 100, or 1000 samples. With the increase in sample size, the range of the 95% CI is considerably decreased from 8.8 for n = 10, 3.9 for n = 100 to 1.3 for n = 1000. The limits of 95% probability (5% alpha error limits) remains relatively unchanged as all three data samples were originated with the same mean and SD. This phenomenon implies that increase in sample size results in a more precise statistical inference (narrower CI) as well as increased statistical power. Critical values for alpha error probability are not much affected by increased sample size. These imaginary data are presented under the assumption of normal distribution with similar dispersions. SD: standard deviation, SEM: standard error of the mean.

References

    1. Lee S. Avoiding negative reviewer comments: common statistical errors in anesthesia journals. Korean J Anesthesiol. 2016;69:219–226.
    1. Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc. 2007;82:591–605.
    1. Lee DK, In J, Lee S. Standard deviation and standard error of the mean. Korean J Anesthesiol. 2015;68:220–223.
    1. Kim TK. T test as a parametric statistic. Korean J Anesthesiol. 2015;68:540–546.
    1. Stoline MR. The status of multiple comparisons: simultaneous estimation of all pairwise comparisons in one-way ANOVA designs. Am Stat. 1981;35:134–141.
    1. Olejnik S, Algina J. Measures of effect size for comparative studies: applications, interpretations, and limitations. Contemp Educ Psychol. 2000;25:241–286.
    1. Vacha-Haase T. Statistical significance should not be considered one of life's guarantees: effect sizes are needed. Educ Psychol Meas. 2001;61:219–224.
    1. Vacha-Haase T, Thompson B. How to estimate and interpret various effect sizes. J Couns Psychol. 2004;51:473–481.
    1. Kotrlik JW, Williams HA, Jabor MK. Reporting and interpreting effect size in quantitative agricultural education research. J Agric Educ. 2011;52:132–142.
    1. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum Publishers; 1988. pp. 19–27.
    1. Rosenthal R, Rubin DB. A simple, general purpose display of magnitude of experimental effect. J Educ Psychol. 1982;74:166–169.
    1. Thompson B. "Statistical," "practical," and "clinical": How many kinds of significance do counselors need to consider. J Couns Dev. 2002;80:64–71.
    1. Thompson B. What future quantitative social science research could look like: confidence intervals for effect sizes. Educ Res. 2002;31:25–32.
    1. Kline RB. Beyond Significance Testing: Reforming data analysis methods in behavioral research. Washington DC: American Psychological Association; 2004. pp. 114–116.
    1. Ferguson CJ. An effect size primer: a guide for clinicans and researchers. Prof Psychol Res Pr. 2009;40:532–538.
    1. McGough JJ, Faraone SV. Estimating the size of treatment effects: moving beyond p values. Psychiatry (Edgmont) 2009;6:21–29.
    1. Hedge LV, Olkin I. Statistical methods for meta-analysis. Orlando: Academic Press Inc; 2014. p. 86.
    1. Kelley K. Confidence intervals for standardized effect sizes: theory, application, and implementation. J Stat Softw. 2007;20:1–24.
    1. Olejnik S, Algina J. Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychol Methods. 2003;8:434–447.
    1. Szumilas M. Explaining odds ratios. J Can Acad Child Adolesc Psychiatry. 2010;13:227–229.
    1. Sistrom CL, Garvan CW. Proportions, odds, and risk. Radiology. 2004;230:12–19.
    1. Warton DI, Hui FK. The arcsine is asinine: the analysis of proportions in ecology. Ecology. 2011;92:3–10.
    1. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum Publishers; 1988. pp. 184–185.
    1. Kotrlik JW, Williams HA. The incorporation of effect size in information technology, learning, and performance research. Inf Technol Learn Perform J. 2003;21:1–7.
    1. Tanaka E, Tsutsumi A, Kawakami N, Kameoka S, Kato H, You Y. Long-term psychological consequences among adolescent survivors of the Wenchuan earthquake in China: a cross-sectional survey six years after the disaster. J Affect Disord. 2016;204:255–261.
    1. Ingelmo PM, Bucciero M, Somaini M, Sahillioglu E, Garbagnati A, Charton A, et al. Intraperitoneal nebulization of ropivacaine for pain control after laparoscopic cholecystectomy: a double-blind, randomized, placebo-controlled trial. Br J Anaesth. 2013;110:800–806.

Source: PubMed

3
Subskrybuj