Group sequential methods for the Mann-Whitney parameter

Claus P Nowak, Tobias Mütze, Frank Konietschke, Claus P Nowak, Tobias Mütze, Frank Konietschke

Abstract

Late phase clinical trials are occasionally planned with one or more interim analyses to allow for early termination or adaptation of the study. While extensive theory has been developed for the analysis of ordered categorical data in terms of the Wilcoxon-Mann-Whitney test, there has been comparatively little discussion in the group sequential literature on how to provide repeated confidence intervals and simple power formulas to ease sample size determination. Dealing more broadly with the nonparametric Behrens-Fisher problem, we focus on the comparison of two parallel treatment arms and show that the Wilcoxon-Mann-Whitney test, the Brunner-Munzel test, as well as a test procedure based on the log win odds, a modification of the win ratio, asymptotically follow the canonical joint distribution. In addition to developing power formulas based on these results, simulations confirm the adequacy of the proposed methods for a range of scenarios. Lastly, we apply our methodology to the FREEDOMS clinical trial (ClinicalTrials.gov Identifier: NCT00289978) in patients with relapse-remitting multiple sclerosis.

Keywords: Brunner-Munzel test; Wilcoxon-Mann-Whitney test; error spending; group sequential methods; nonparametric relative effect; win odds.

Conflict of interest statement

Declaration of conflicting interests: The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Claus P. Nowak and Tobias Mütze are employees of Novartis Pharma AG.

Figures

Figure 1.
Figure 1.
Normal distribution—Setting 1 Notes: The lines show the relative frequency of the 100000 simulation runs, where the null hypothesis could be rejected at some stage based on the Brunner-Munzel test (with t-approximation) as in (5), the Wilcoxon-Mann-Whitney test as in (1) and the log win odds test as in (9) for five different total maximum sample sizes, two error spending functions, up to four stages in total as well as two different allocation ratios.
Figure 2.
Figure 2.
Normal distribution—Setting 2 Notes: The lines show the relative frequency of the 100000 simulation runs, where the null hypothesis could be rejected at some stage based on the Brunner-Munzel test (with t-approximation) as in (5), the Wilcoxon-Mann-Whitney test as in (1) and the log win odds test as in (9) for five different total maximum sample sizes, two error spending functions, up to four stages in total as well as two different allocation ratios.
Figure 3.
Figure 3.
Normal distribution—Setting 3 Notes: The lines show the relative frequency of the 100000 simulation runs, where the null hypothesis could be rejected at some stage based on the Brunner-Munzel test (with t-approximation) as in (5), the Wilcoxon-Mann-Whitney test as in (1) and the log win odds test as in (9) for five different total maximum sample sizes, two error spending functions, up to four stages in total as well as two different allocation ratios.

References

    1. European Medicines Agency. Reflection Paper on Methodological Issues in Confirmatory Clinical Trials Planned with an Adaptive Design, 2007. .
    1. US Food and Drug Administration. Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry, 2019. (Accessed November 9, 2020).
    1. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinicial Trials. Boca Raton: Chapman & Hall/CRC, 2000.
    1. Proschan MA, Lan KKG, Wittes J. Statistical Monitoring of Clinical Trials: A Unified Approach. MA, New York: Springer, 2006.
    1. Wassmer G, Brannath W. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer International Publishing, 2016.
    1. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 1947; 18: 50–60.
    1. Wilcoxon F. Individual comparisons by ranking methods. Biometric Bull 1945; 1: 80–83.
    1. Wilcoxon F. Probability tables for individual comparisons by ranking methods. Biometrics 1947; 3: 119–122.
    1. Alling DW. Early decision in the Wilcoxon two-sample test. J Am Stat Assoc 1963; 58: 713–720.
    1. Phatarfod RM, Sudbury A. A simple sequential Wilcoxon test. Aust J Stat 1988; 30: 93–106.
    1. Shuster JJ, Chang MN, Tian L. Design of group sequential clinical trials with ordinal categorical data based on the Mann–Whitney–Wilcoxon test. Seq Anal 2004; 23: 413–426.
    1. Brunner E, Munzel U. The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation. Biom J 2000; 42: 17–25.
    1. Brunner E, Bathke AC, Konietschke F. Rank and Pseudo-Rank Procedures for Independent Observations in Factorial Designs. Springer International Publishing, 2018.
    1. Thas O, De Neve J, Clement L. et al.. Probabilistic index models. J R Stat Soc B (Statistical Methodology) 2012; 74: 623–671.
    1. Fay MP, Brittain EH, Shih JH. et al.. Causal estimands and confidence intervals associated with Wilcoxon-Mann-Whitney tests in randomized experiments. Stat Med 2018; 37: 2923–2937.
    1. Brunner E, Vandemeulebroecke M, Mütze T. Win odds: An adaptation of the win ratio to include ties. Stat Med 2021; 40: 3367–3384.
    1. Putter J. The treatment of ties in some nonparametric tests. Ann Math Stat 1955; 26: 368–386.
    1. Pocock SJ, Ariti CA, Collier TJ. et al.. The win ratio: A new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J 2011; 33: 176–182.
    1. Wang D, Pocock S. A win ratio approach to comparing continuous non-normal outcomes in clinical trials. Pharm Stat 2016; 15: 238–245.
    1. Gasparyan SB, Folkvaljon F, Bengtsson O. et al.. Adjusted win ratio with stratification: Calculation methods and interpretation. Stat Methods Med Res 2020; 0: 1–32.
    1. Scharfstein DO, Tsiatis AA, Robins JM. Semiparametric efficiency and its implication on the design and analysis of group-sequential studies. J Am Stat Assoc 1997; 92: 1342–1350.
    1. Cramér H, Wold H. Some theorems on distribution functions. J Lond Math Soc 1936; s1-11: 290–294.
    1. Lévy P. Calcul des probabilités, volume 9. Paris: Gauthier-Villars Paris, 1925.
    1. Ruymgaart FH (1980) A unified approach to the asymptotic distribution theory of certain midrank statistics. In Raoult JP (eds.) Statistique non Paramétrique Asymptotique. Lecture Notes in Mathematics, Vol 821. Springer, Berlin: Heidelberg. 10.1007/BFb0097422
    1. Satterthwaite FE. An approximate distribution of estimates of variance components. Biometrics Bull 1946; 2: 110–114.
    1. Smith HF. The problem of comparing the results of two experiments with unequal errors. J Council Sci Ind Res 1936; 9: 211–212.
    1. Welch BL. The significance of the difference between two means when the population variances are unequal. Biometrika 1937; 29: 350–362.
    1. Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika 1977; 64: 191–199.
    1. O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics 1979; 35: 549–556.
    1. Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70: 659–663.
    1. Wassmer G, Pahlke F. rpact: Confirmatory Adaptive Clinical Trial Design and Analysis, 2020. . R package version 3.0.1.
    1. Kappos L, Radue EW, O’Connor P. et al.. A placebo-controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010; 362: 387–401.
    1. Kurtzke JF. Rating neurologic impairment in mulitple sclerosis: An expanded disability status scale (EDSS). Neurology 1983; 33: 1444–1452.
    1. Happ M, Bathke AC, Brunner E. Optimal sample size planning for the Wilcoxon-Mann-Whitney test. Stat Med 2019; 38: 363–375.
    1. Genz A, Bretz F, Miwa T. et al.. mvtnorm: Multivariate Normal and t Distributions, 2020. . R package version 1.1-1.
    1. Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 1955; 50: 1096–1121.
    1. Tukey J. Comparing individual means in the analysis of variance. Biometrics 1949; 5: 99–114.
    1. Gardner M. The paradox of the nontransitive dice and the elusive principle of indifference. Sci Am: Math Games Column 1970; 223: 110–114.
    1. Savage RP. The paradox of nontransitive dice. Am Math Mon 1994; 101: 429–436.
    1. Thangevelu K, Brunner E. Wilcoxon-Mann-Whitney test for stratified samples and Efron’s paradox dice. J Stat Plan Inference 2007; 137: 720–737.
    1. Buyse M. Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Stat Med 2010; 29: 3245–3257.
    1. Cantagallo E, De Backer M, Kicinski M. et al.. A new measure of treatment effect in clinical trials involving competing risks based on generalized pairwise comparisons. Biom J 2021; 63: 272–288.
    1. Péron J, Buyse M, Ozenne B. et al.. An extension of generalized pairwise comparisons for prioritized outcomes in the presence of censoring. Stat Methods Med Res 2018; 27: 1230–1239.
    1. Buyse M, Peron J. Generalized pairwise comparisons for prioritized outcomes. In Piantadosi S and Meinert CL (eds.) Principles and Practice of Clinical Trials. Cham: Springer, 2020. pp. 1–25.
    1. Hoeffding W. A class of statistics with asymptotically normal distributions. Ann Stat 1948; 19: 293–325.
    1. Lee AJ. U-Statistics: Theory and Practice. New York: Marcel Dekker, 1990.

Source: PubMed

3
Sottoscrivi