Trial Sequential Analysis in systematic reviews with meta-analysis

Jørn Wetterslev, Janus Christian Jakobsen, Christian Gluud, Jørn Wetterslev, Janus Christian Jakobsen, Christian Gluud

Abstract

Background: Most meta-analyses in systematic reviews, including Cochrane ones, do not have sufficient statistical power to detect or refute even large intervention effects. This is why a meta-analysis ought to be regarded as an interim analysis on its way towards a required information size. The results of the meta-analyses should relate the total number of randomised participants to the estimated required meta-analytic information size accounting for statistical diversity. When the number of participants and the corresponding number of trials in a meta-analysis are insufficient, the use of the traditional 95% confidence interval or the 5% statistical significance threshold will lead to too many false positive conclusions (type I errors) and too many false negative conclusions (type II errors).

Methods: We developed a methodology for interpreting meta-analysis results, using generally accepted, valid evidence on how to adjust thresholds for significance in randomised clinical trials when the required sample size has not been reached.

Results: The Lan-DeMets trial sequential monitoring boundaries in Trial Sequential Analysis offer adjusted confidence intervals and restricted thresholds for statistical significance when the diversity-adjusted required information size and the corresponding number of required trials for the meta-analysis have not been reached. Trial Sequential Analysis provides a frequentistic approach to control both type I and type II errors. We define the required information size and the corresponding number of required trials in a meta-analysis and the diversity (D2) measure of heterogeneity. We explain the reasons for using Trial Sequential Analysis of meta-analysis when the actual information size fails to reach the required information size. We present examples drawn from traditional meta-analyses using unadjusted naïve 95% confidence intervals and 5% thresholds for statistical significance. Spurious conclusions in systematic reviews with traditional meta-analyses can be reduced using Trial Sequential Analysis. Several empirical studies have demonstrated that the Trial Sequential Analysis provides better control of type I errors and of type II errors than the traditional naïve meta-analysis.

Conclusions: Trial Sequential Analysis represents analysis of meta-analytic data, with transparent assumptions, and better control of type I and type II errors than the traditional meta-analysis using naïve unadjusted confidence intervals.

Keywords: Diversity; Fixed-effect model; Group sequential analysis; Heterogeneity; Information size; Interim analysis; Meta-analysis; Random-effects model; Sample size; Trial sequential analysis.

Figures

Fig. 1
Fig. 1
a Showing Trial Sequential Analysis of meta-analysis before the Target Temperature Management Trial. The Z-value is the test statistic and |Z| = 1.96 corresponds to a P = 0.05; the higher the Z-value, the lower the P-value. Trial Sequential Analysis (TSA) of mortality after out of hospital cardiac arrest patients, randomised to cooling to 33°–34 °C versus 36 °C or no temperature control in four trials performed before the Target Temperature Management (TTM) trial [16, 20]. The required information size to detect or reject the 17% relative risk reduction found in the random-effects model meta-analysis is calculated to 977 participants using the diversity found in the meta-analysis of 23%, mortality in the control groups of 60%, with a double sided α of 0.05 and a β of 0.20 (power of 80.0%). The cumulative Z-curve (black full line with quadratic indicatons of each trial) surpasses the traditional boundary for statistical significance during the third trial and touches the traditional boundary after the fourth trial (95% confidence interval: 0.70 to 1.00; P = 0.05). However, none of the trial sequential monitoring boundaries (etched curves above and below the traditional horizontal lines for statistical significance) have been surpassed in the TSA. Therefore, the result is inconclusive when adjusted for sequential testing on an accumulating number of participants and the fact that the required information size has not yet been achieved. The TSA-adjusted confidence interval is 0.63 to 1.12 after inclusion of the fourth trial [10, 12]. b showing Trial Sequential Analysis of meta-analysis after the Target Temperature Management Trial. The Z-value is the test statistic and |Z| = 1.96 corresponds to a P = 0.05; the higher the Z-value, the lower the P-value. Trial Sequential Analysis (TSA) of mortality after out of hospital cardiac arrest patients, randomised to cooling to 33°–34 °C versus 36 °C or no temperature control in five trials after inclusion of the Target Temperature Management (TTM) Trial [17]. The required information size to detect or reject the 17% relative risk reduction found in the random-effects model meta-analysis prior to the TTM Trial is calculated to 2040 participants using the diversity found in the meta-analysis of 65%, mortality in the control groups of 60%, with a double sided α of 0.05 and a β of 0.20 (power of 80.0%). The cumulative Z-curve (black full line with quadratic indicatons of each trial) touches the boundary for futility indicating that it will be unlikely to reach a statistical significant P < 0.05, even if we proceed to include trials randomising patients until the required information size of 2040 is reached. The result indicates that a 17% relative risk reduction (or more) may be excluded, even though the required information size has not been achieved, adjusting for sparse data and sequential testing on an accumulating number of patients [10, 12]
Fig. 2
Fig. 2
Showing three different group sequential boundaries in a single trial with interim analysis. The Z-value is the test statistic and a |Z| = 1.96 corresponds to P = 0.05; the higher the Z-value, the lower the P-value. This is a historical overview of group sequential boundaries for the cumulative Z-curve in relation to the number of randomised participant in a single trial [19, 32, 33]
Fig. 3
Fig. 3
Showing trial sequential monitoring boundaries for benefit and harm in a cumulative meta-analysis. The Z-value is the test statistic and |Z| = 1.96 corresponds to P = 0.05; the higher Z-values, the lower the P-values. a Shows how an early statistical significance no longer is present in a cumulative meta-analysis when the required information size has been reached. b Shows how an early lack of statistical significance emerges later when the requiered information size is achieved. c Shows how an early statistical significance can be avoided by adjusting the level of statistical significance. The etched upper curve is the group sequential boundary adjusting the level of statistical significance for multiple testing and sparse data. Z-value is shown on the y-axis and on the x-axis IS is the required information size [10]
Fig. 4
Fig. 4
Showing trial sequential monitoring boundaries for benefit and futility in cumulative meta-analysis. The Z-value is the test statistic and |Z| = 1.96 corresponds to P = 0.05; the higher Z-values, the lower P-values. a Shows how trial sequential monitoring of a cumulative meta-analysis, before the requiered information size (IS) is achieved, makes it likely that the assumed effect is in fact absent when the Z-curve surpasses the futility-boundary (etched curve). b Shows how trial sequential monitoring of a cumulative meta-analysis, before the required information size (RIS) is achieved, makes it likely that the assumed effect is in fact true when the Z-curve surpasses the trial sequential monitoring boundary for benefit (etched curve). Lan-DeMets’ α-spending-function has been applied for the construction of the trial sequential monitoring boundaries, the critical Z-values [10]

References

    1. Turner RM, Bird SM, Higgins JP. The impact of study size on metaanalyses: examination of underpowered studies in Cochrane reviews. PLoS One. 2013;8:e59202. doi: 10.1371/journal.pone.0059202.
    1. Pereira TV, Ioannidis JP. Statistically significant metaanalyses of clinical trials have modest credibility and inflated effects. J Clin Epidemiol. 2011;64:1060–9. doi: 10.1016/j.jclinepi.2010.12.012.
    1. AlBalawi Z, McAlister FA, Thorlund K, Wong M, Wetterslev J. Random error in cardiovascular meta-analyses: how common are false positive and false negative results? Int J Cardiol. 2013;168:1102–7. doi: 10.1016/j.ijcard.2012.11.048.
    1. Imberger G. Multiplicity and sparse data in systematic reviews of anaesthesiological interventions: a cause of increased risk of random error and lack of reliability of conclusions? Copenhagen: Copenhagen University, Faculty of Health and Medical Sciences; 2014.
    1. Brok J, Thorlund K, Wetterslev J, Gluud C. Apparently conclusive metaanalyses may be inconclusive—trial sequential analysis adjustment of random error risk due to repetitive testing of accumulating data in apparently conclusive neonatal metaanalyses. Int J Epidemiol. 2009;38:287–98. doi: 10.1093/ije/dyn188.
    1. Thorlund K, Imberger G, Walsh M, Chu R, Gluud C, Wetterslev J, Guyatt G, Devereaux PJ, Thabane L. The number of patients and events required to limit the risk of overestimation of intervention effects in meta-analysis—a simulation study. PLoS One. 2011;6:e25491. doi: 10.1371/journal.pone.0025491.
    1. Wetterslev J, Thorlund K, Brok J, Gluud C. Trial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis. J Clin Epidemiol. 2008;61:64–75. doi: 10.1016/j.jclinepi.2007.03.013.
    1. Pogue J, Yusuf S. Cumulating evidence from randomised trials: utilizing sequential monitoring boundaries for cumulative meta-analysis. Control Clin Trials. 1997;18:580–93. doi: 10.1016/S0197-2456(97)00051-2.
    1. Pogue J, Yusuf S. Overcoming the limitations of current meta-analysis of randomised controlled trials. Lancet. 1998;351:47–52. doi: 10.1016/S0140-6736(97)08461-4.
    1. Thorlund K, Engstrøm J, Wetterslev J, Brok J, Imberger G, Gluud C. User manual for trial sequential analysis (TSA). Copenhagen Trial Unit, Centre for Clinical Intervention research, Copenhagen, Denmark. 2011: 1–115 available from .
    1. Wetterslev J, Thorlund K, Brok J, Gluud C. Estimating required information size by quantifying diversity in a random-effects meta-analysis. BMC Med Res Methodol. 2009;9:86. doi: 10.1186/1471-2288-9-86.
    1. Thorlund K, Engstrøm J, Wetterslev J, Brok J, Imberger G, Gluud C. Software for trial sequential analysis (TSA) ver. 0.9.5.5 Beta. Copenhagen Trial Unit, Centre for Clinical Intervention Research, Copenhagen, Denmark, free-ware available at .
    1. Young C, Horton R. Putting clinical trials into context. Lancet. 2005;366:107–8. doi: 10.1016/S0140-6736(05)66846-8.
    1. Clarke M, Horton R. Bringing it all together: Lancet-Cochrane collaborate on systematic reviews. Lancet. 2001;357:1728. doi: 10.1016/S0140-6736(00)04934-5.
    1. Clarke M, Hopewell S, Chalmers I. Clinical trials should begin and end with systematic reviews of relevant evidence: 12 years and waiting. Lancet. 2010;376:20–21. doi: 10.1016/S0140-6736(10)61045-8.
    1. Nielsen N, Friberg H, Gluud C, Wetterslev J. Hypothermia after cardiac arrest should be further evaluated—a systematic review of randomised trials with metaanalysis and trial sequential analysis. Int J Cardiol. 2011;151:333–41. doi: 10.1016/j.ijcard.2010.06.008.
    1. Nielsen N, Wetterslev J, Cronberg T, Erlinge D, Gasche Y, Hassager C, Horn J, Hovdenes J, Kjaergaard J, Kuiper M, Pellis T, Stammet P, Wanscher M, Wise MP, Åneman A, Al-Subaie N, Boesgaard S, Bro-Jeppesen J, Brunetti I, Bugge JF, Hingston CD, Juffermans NP, Koopmans M, Køber L, Langørgen J, Lilja G, Møller JE, Rundgren M, Rylander C, Smid O, Werer C, Winkel P, Friberg H, TTM Trial Investigators Targeted temperature management at 33°C versus 36°C after cardiac arrest. N Engl J Med. 2013;369:2197–206. doi: 10.1056/NEJMoa1310519.
    1. Nielsen N, Wetterslev J, al-Subaie N, Andersson B, Bro-Jeppesen J, Bishop G, Brunetti I, Cranshaw J, Cronberg T, Edqvist K, Erlinge D, Gasche Y, Glover G, Hassager C, Horn J, Hovdenes J, Johnsson J, Kjaergaard J, Kuiper M, Langørgen J, Macken L, Martinell L, Martner P, Pellis T, Pelosi P, Petersen P, Persson S, Rundgren M, Saxena M, Svensson R, Stammet P, Thorén A, Undén J, Walden A, Wallskog J, Wanscher M, Wise MP, Wyon N, Aneman A, Friberg H. Target temperature management after out-of-hospital cardiac arrest – a randomised, parallel-group, assessor-blinded clinical trial – rationale and design. Am Heart J. 2012;163:541–8. doi: 10.1016/j.ahj.2012.01.013.
    1. Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–63. doi: 10.2307/2336502.
    1. Peberdy MA, Callaway CW, Neumar RW, Geocadin RG, Zimmerman JL, Donnino M, Gabrielli A, Silvers SM, Zaritsky AL, Merchant R, Vanden Hoek TL, Kronick SL, American Heart Association Part 9: post-cardiac arrest care: American Heart Association Guidelines for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care. Circulation. 2010;122(suppl 3):S768–S786. doi: 10.1161/CIRCULATIONAHA.110.971002.
    1. Armitage P, McPherson CK, Rowe BC. Repeated significance tests on accumulating data. J Royal Stat Soc Series A (General) 1969;132:235–44. doi: 10.2307/2343787.
    1. Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191–9. doi: 10.1093/biomet/64.2.191.
    1. Berkey CS, Mosteller F, Lau J, Antman EM. Uncertainty of the time of first significance in random effects cumulative meta-analysis. Control Clin Trials. 1996;17:357–71. doi: 10.1016/S0197-2456(96)00014-1.
    1. Imberger G, Vejlby AD, Hansen SB, Møller AM, Wetterslev J. Statistical multiplicity in systematic reviews of anaesthesia interventions: a quantification and comparison between Cochrane and non-Cochrane reviews. PLoS One. 2011;6:e28422. doi: 10.1371/journal.pone.0028422.
    1. Wald A. Contributions to the theory of statistical estimation and testing hypotheses. Ann Math Stat. 1939;10:299–326. doi: 10.1214/aoms/1177732144.
    1. Wald A. Sequential tests of statistical hypotheses. Ann Math Stat. 1945;16:117–86. doi: 10.1214/aoms/1177731118.
    1. Wald A, Wolfowitz J. Bayes solutions of sequential decision problems. Proc Natl Acad Sci U S A. 1949;35:99–102. doi: 10.1073/pnas.35.2.99.
    1. Winkel P, Zhang NF. Statistical development of quality in medicine. Chichester, West Sussex: Wiley; 2007. pp. 1–224.
    1. Armitage P. The evolution of ways of deciding when clinical trials should stop recruiting. James Lind Library Bulletin 2013. .
    1. Dunn OJ. Multiple comparisons among means. J Am Stat Assoc. 1961;56:52–64. doi: 10.1080/01621459.1961.10482090.
    1. Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG. Design and analysis of randomised clinical trials requiring prolonged observation of each patient. I. Introduction and design. Br J Cancer. 1976;34:585–612. doi: 10.1038/bjc.1976.220.
    1. O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549–56. doi: 10.2307/2530245.
    1. ICH Harmonised Tripartite Guideline Statistical principles for clinical trials. International Conference on Harmonisation E9 Expert Working Group. Stat Med. 1999;18:1905–42.
    1. Kim K, DeMets DL. Confidence intervals following group sequential tests in clinical trials. Biometrics. 1987;43:857–64. doi: 10.2307/2531539.
    1. DeMets DL. Group sequential procedures: calendar versus information time. Stat Med. 1989;8:1191–8. doi: 10.1002/sim.4780081003.
    1. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Boca Raton: Chapman & Hall/CRC Press; 2000.
    1. Grant AM, Altman DG, Babiker AB, Campbell MK, Clemens FJ, Darbyshire JH, Elbourne DR, McLeer SK, Parmar MK, Pocock SJ, Spiegelhalter DJ, Sydes MR, Walker AE, Wallace SA, DAMOCLES Study Group Issues in data monitoring and interim analysis of trials. Health Technol Assess. 2005;9:1–238. doi: 10.3310/hta9070.
    1. Chow S, Shao J, Wang H. Sample size calculations in clinical research. Taylor & Francis/CRC: Boca Raton; 2003.
    1. Reboussin DM, DeMets DL, Kim KM, Lan KK. Computations for group sequential boundaries using the Lan-DeMets spending function method. Control Clin Trials. 2000;21:190–207. doi: 10.1016/S0197-2456(00)00057-X.
    1. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–88. doi: 10.1016/0197-2456(86)90046-2.
    1. Deeks JJ, Higgins JPT. Statistical algorithms in Review Manager ver. 5.3. On behalf of the Statistical Methods Group of The Cochrane Collaboration. 2010.
    1. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21:1539–58. doi: 10.1002/sim.1186.
    1. Kulinskaya E, Wood J. Trial sequential methods for meta-analysis. Res Synth Methods. 2014;5:212–220. doi: 10.1002/jrsm.1104.
    1. Thorlund K, Devereaux PJ, Wetterslev J, Guyatt G, Ioannidis JP, Thabane L, Gluud LL, Als-Nielsen B, Gluud C. Can trial sequential monitoring boundaries reduce spurious inferences from meta-analyses? Int J Epidemiol. 2009;38:276–86. doi: 10.1093/ije/dyn179.
    1. Imberger G, Thorlund K, Gluud C, Wetterslev J. False positive findings in cumulative meta-analysis with and without application of trial sequential analysis: an empirical review. BMJ Open. 2016;6(8):e011890. doi: 10.1136/bmjopen-2016-011890.
    1. Imberger G, Gluud C, Boylan J, Wetterslev J. Systematic reviews of anesthesiologic interventions reported as statistically significant: problems with power, precision, and type 1 error protection. Anesth Analg. 2015;121:1611–1622. doi: 10.1213/ANE.0000000000000892.
    1. Mascha EJ. Alpha, beta, meta: guidelines for assessing power and type I error in meta-analyses. Anesth Analg. 2015;121:1430–1433. doi: 10.1213/ANE.0000000000000993.
    1. Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JP. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. Int J Epidemiol. 2012;41:818–27. doi: 10.1093/ije/dys041.
    1. Thorlund K, Imberger G, Johnston BC, Walsh M, Awad T, Thabane L, Gluud C, Devereaux PJ, Wetterslev J. Evolution of heterogeneity (I2) estimates and their 95% confidence intervals in large meta-analyses. PLoS One. 2012;7:e39471. doi: 10.1371/journal.pone.0039471.
    1. Brok J, Thorlund K, Gluud C, Wetterslev J. Trial sequential analysis reveals insufficient information size and potentially false positive results in many meta-analyses. J Clin Epidemiol. 2008;61:763–9. doi: 10.1016/j.jclinepi.2007.10.007.
    1. Higgins JPT, Green S. red. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. The Cochrane Collaboration, 2011. .
    1. Keus F, Wetterslev J, Gluud C, van Laarhoven CJ. Evidence at a glance: error matrix approach for overviewing available evidence. BMC Med Res Methodol. 2010;10:90. doi: 10.1186/1471-2288-10-90.
    1. Garattini S, Jakobsen JC, Wetterslev J, Berthele’ V, Banzi R, Rath A, Neugebauer E, Laville M, Maisson Y, Hivert Y, Eickermann M, Aydin B, Ngwabyt S, Martinho C, Giradi C, Szmigielski C, Demotes-Maynard J, Gluud C. Evidence-based clinical practice: overview of threats to the validity of evidence. Eur J Intern Med. 2016;32:13–21. doi: 10.1016/j.ejim.2016.03.020.
    1. Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomised trials in meta-analyses. Ann Intern Med. 2001;135:982–9. doi: 10.7326/0003-4819-135-11-200112040-00010.
    1. Savović J, Jones HE, Altman DG, Harris RJ, Jüni P, Pildal J, Als-Nielsen B, Balk EM, Gluud C, Gluud LL, Ioannidis JP, Schulz KF, Beynon R, Welton NJ, Wood L, Moher D, Deeks JJ, Sterne JA. Influence of reported study design characteristics on intervention effect estimates from randomised, controlled trials. Ann Intern Med. 2012;157:429–38. doi: 10.7326/0003-4819-157-6-201209180-00537.
    1. Lundh A, Sismondo S, Lexchin J, Busuioc OA, Bero L. Industry sponsorship and research outcome. Cochrane Database Syst Rev. 2012;12:MR000033.
    1. Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomised trials: comparison of protocols to published articles. JAMA. 2004;291:2457–65. doi: 10.1001/jama.291.20.2457.
    1. Andrews JC, Schünemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, Rind D, Montori VM, Brito JP, Norris S, Elbarbary M, Post P, Nasser M, Shukla V, Jaeschke R, Brozek J, Djulbegovic B, Guyatt G. GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation’s direction and strength. J Clin Epidemiol. 2013;66:726–35. doi: 10.1016/j.jclinepi.2013.02.003.
    1. The Fermi paradox. . Accessed 27 Feb 2017.
    1. Roberts I, Ker K, Edwards P, Beecher D, Manno D, Sydenham E. The knowledge system underpinning healthcare is not fit for purpose and must change. BMJ. 2015;350:h2463. doi: 10.1136/bmj.h2463.
    1. Bolland MJ, Grey A, Gamble GD, Reid IR. The effect of vitamin D supplementation on skeletal, vascular, or cancer outcomes: a trial sequential meta-analysis. Lancet Diabetes Endocrinol. 2014;2(4):307–20. doi: 10.1016/S2213-8587(13)70212-2.
    1. Tovey DI, Bero L, Farquhar C, Lasserson T, MacLehose H, Macdonald G, et al. A response to Ian Roberts and his colleagues. Rapid response. BMJ. 2015;350:h2463. doi: 10.1136/bmj.h2952.
    1. Wetterslev J, Engstrøm J, Gluud C, Thorlund K. Trial sequential analysis: methods and software for cumulative meta-analyses. Cochrane Methods Cochrane Database Syst Rev. 2012;2(suppl 1):29–31.
    1. Higgins JPT. Comment on “Trial sequential analysis: methods and software for cumulative meta-analyses”. Cochrane Methods Cochrane Database Syst Rev. 2012;2(suppl 1):32–33.
    1. Wetterslev J, Engstrøm J, Gluud C, Thorlund K. Response to “Comment by Higgins”. Cochrane Methods Cochrane Database Syst Rev. 2012;2(suppl 1):33–5.
    1. Higgins JP, Whitehead A, Simmonds M. Sequential methods for random-effects meta-analysis. Stat Med. 2011;30:903–21. doi: 10.1002/sim.4088.
    1. Fleisher LA, Beckman JA, Brown KA, Calkins H, Chaikof EL, Fleischmann KE, Freeman WK, Froehlich JB, Kasper EK, Kersten JR, Riegel B, Robb JF, Smith SC, Jr, Jacobs AK, Adams CD, Anderson JL, Antman EM, Buller CE, Creager MA, Ettinger SM, Faxon DP, Fuster V, Halperin JL, Hiratzka LF, Hunt SA, Lytle BW, Nishimura R, Ornato JP, Page RL, Riegel B, Tarkington LG, Yancy CW. ACC/AHA 2007 guidelines on perioperative cardiovascular evaluation and care for noncardiac surgery: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Writing Committee to Revise the 2002 Guidelines on Perioperative Cardiovascular Evaluation for Noncardiac Surgery) J Am Coll Cardiol. 2007;50:1707–32. doi: 10.1016/j.jacc.2007.09.001.
    1. Popper KR. Logik der Forschung. Vienna: Springer; 1959.
    1. Bangalore S, Wetterslev J, Pranesh S, Sawhney S, Gluud C, Messerli FH. Perioperative beta blockers in patients having non-cardiac surgery: a meta-analysis. Lancet. 2008;372:1962–76. doi: 10.1016/S0140-6736(08)61560-3.
    1. Jakobsen JC, Wetterslev J, Winkel P, Lange T, Gluud C. The threshold for statistical and clinical significance in systematic reviews with metaanalytic methods. Med Res Methodol. 2014;14:120. doi: 10.1186/1471-2288-14-120.
    1. Sterne JA. Teaching hypothesis tests – time for significant change? Stat Med 2002;21: 985–94, 995–9, 1001.
    1. Jakobsen JC, Gluud C, Winkel P, Lange T, Wetterslev J. The thresholds for statistical and clinical significance – a five-step procedure for evaluation of intervention effects in randomised clinical trials. BMC Med Res Methodol. 2014;14:34. doi: 10.1186/1471-2288-14-34.
    1. Roloff V, Higgins JP, Sutton AJ. Planning future studies based on the conditional power of a meta-analysis. Stat Med. 2013;32:11–24. doi: 10.1002/sim.5524.
    1. IntHout J, Ioannidis JP, Borm GF. Obtaining evidence by a single well-powered trial or several modestly powered trials. Stat Methods Med Res. 2016;25(2):538–52. doi: 10.1177/0962280212461098.
    1. Valentine JC, Pigott TD, Rothstein HR. How many studies do you need? A primer on statistical power for meta-analysis. J Educ Behav Stat. 2010;35(2):215–247. doi: 10.3102/1076998609346961.
    1. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-analysis. Chichester: John Wiley & Sons Ltd.; 2009.
    1. Higgins JP, Spiegelhalter DJ. Being sceptical about meta-analyses: a Bayesian perspective on magnesium trials in myocardial infarction. Int J Epidemiol. 2002;31:96–104. doi: 10.1093/ije/31.1.96.
    1. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Statistics in practice. Chichester: John Wiley & Sons Ltd; 2004.
    1. Higgins JP, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. J R Stat Soc Ser A Stat Soc. 2009;172:137–59. doi: 10.1111/j.1467-985X.2008.00552.x.
    1. Jennison C, Turnbull BW. Efficient group sequential designs when there are several effect sizes under consideration. Stat Med. 2006;25:917–32. doi: 10.1002/sim.2251.
    1. Pereira TV, Horwitz RI, Ioannidis JP. Empirical evaluation of very large treatment effects of medical interventions. JAMA. 2012;308:1676–84. doi: 10.1001/jama.2012.13444.
    1. Lindley DV. A statistical paradox. Biometrika. 1957;44:187–92. doi: 10.1093/biomet/44.1-2.187.
    1. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124.
    1. Fisher R. Statistical methods and scientific induction. J R Stat Soc Ser B. 1955;17:69–78.
    1. Johnson EV. Revised standards for statistical evidence. PNAS. 2013, 110:48:19313–19317. Accessed Dec 2016. .

Source: PubMed

3
Sottoscrivi