Use of historical control data for assessing treatment effects in clinical trials

Kert Viele, Scott Berry, Beat Neuenschwander, Billy Amzal, Fang Chen, Nathan Enas, Brian Hobbs, Joseph G Ibrahim, Nelson Kinnersley, Stacy Lindborg, Sandrine Micallef, Satrajit Roychoudhury, Laura Thompson, Kert Viele, Scott Berry, Beat Neuenschwander, Billy Amzal, Fang Chen, Nathan Enas, Brian Hobbs, Joseph G Ibrahim, Nelson Kinnersley, Stacy Lindborg, Sandrine Micallef, Satrajit Roychoudhury, Laura Thompson

Abstract

Clinical trials rarely, if ever, occur in a vacuum. Generally, large amounts of clinical data are available prior to the start of a study, particularly on the current study's control arm. There is obvious appeal in using (i.e., 'borrowing') this information. With historical data providing information on the control arm, more trial resources can be devoted to the novel treatment while retaining accurate estimates of the current control arm parameters. This can result in more accurate point estimates, increased power, and reduced type I error in clinical trials, provided the historical information is sufficiently similar to the current control data. If this assumption of similarity is not satisfied, however, one can acquire increased mean square error of point estimates due to bias and either reduced power or increased type I error depending on the direction of the bias. In this manuscript, we review several methods for historical borrowing, illustrating how key parameters in each method affect borrowing behavior, and then, we compare these methods on the basis of mean square error, power and type I error. We emphasize two main themes. First, we discuss the idea of 'dynamic' (versus 'static') borrowing. Second, we emphasize the decision process involved in determining whether or not to include historical borrowing in terms of the perceived likelihood that the current control arm is sufficiently similar to the historical data. Our goal is to provide a clear review of the key issues involved in historical borrowing and provide a comparison of several methods useful for practitioners.

Keywords: Bayesian; borrowing; historical data; priors.

Copyright © 2013 John Wiley & Sons, Ltd.

Figures

Figure 1
Figure 1
Conclusions reached by separate, pooled, and single arm trials. The X (control) and Y (treatment) axes show the possible values of the observed data, while the three curves show the decision boundaries for the separate (orange), pooled (red), and single arm (purple) trials. Note that in a single arm trial, control data are not collected, and hence, the decision is based on the treatment data alone.
Figure 2
Figure 2
Comparison of the mean square error (MSE) (left), type I error (middle), and power (right) for separate (orange), pooled (red), and single arm trial (purple) designs. Generally, there is a ‘sweet spot’ near 0.65 where borrowing simultaneously achieves lower MSE, lower type I error, and higher power compared to the separate analysis. Below the sweet spot, we see diminished power with borrowing, and above the sweet spot, we see inflated type I error. Assessing the relative likelihood of these regions is important to assessing the costs and benefits of borrowing.
Figure 3
Figure 3
Conclusions drawn by separate, pooled, and test-then-pool (using sizes 0.20, 0.10, 0.05, and 0.01 for the test of equality between current data and historical data) analyses. The curves indicate the decision boundaries for each design, showing separate (orange), pooled (red), and test-then-pool (blue). Results above the curves are successful trials. Note that small sizes for the test of equality produce the greatest overlap between test-then-pool and the pooled analysis. Thus, the 0.01 size test of equality (the dotted line) has the greatest overlap with pooling. For control values between 109 and 149 responses, test-then-pool (at size 0.10) chooses the pooled analysis, while outside this region, the test-then-pool approach emulates a separate analysis.
Figure 4
Figure 4
Comparison of borrowing, mean square error, type I error, and power for test-then-pool. The red curves indicate pooled analyses, the orange curves separate analyses, and the blue curves test-then-pool analyses with sizes of 0.20, 0.10, 0.05, and 0.01 for the test of equality (the primary analysis for testing the novel treatment still uses size 0.025). Test-then-pool incorporates dynamic borrowing (the model borrows less as the historical and current control rates diverge). This caps the amount of type I error inflation. In addition, by changing the size of the test, one can construct a continuum of procedures that can achieve any particular goal for type I error.
Figure 5
Figure 5
Decisions made by downweighting using a power prior. Similar to Figures 1 and 3, the curves indicate the decision from a separate analysis (orange), pooled analysis (red), or a 20%, 40%, 60%, or 80% downweighting (dot dashed, solid, dashed, and dotted lines). Data above a curve result in trial success for that design. Downweighting essentially acts proportionally between the separate and pooled analyses.
Figure 6
Figure 6
Borrowing, mean square error type I error, and power comparison for downweighting. The top panel provides the effective number of borrowed observations (here directly set by the weight parameter, one of 20%, 40%, 60%, or 80%). The bottom left panel shows the type I error as a function of the true control proportion, while the bottom right panel shows the power. Generally, the ‘sweet spot’ where borrowing dominates a separate analysis is longer for downweighting than pooling, and downweighting with low weights can limit the amount of type I error inflation for values near the historical rate of 0.65.
Figure 7
Figure 7
Decisions made using hierarchical borrowing with different priors on τ2. Hierarchical borrowing is dynamic, emulating pooling when the current controls agree with historical data and coming closer to separate analyses as the current control data diverge from the historical data, and interpolating between those extreme with an S-shape. The green curves (quite similar to a separate analysis represent the Gamma(ε, ε) priors, while the blue curves represent Gamma(1, ε) priors.
Figure 8
Figure 8
Comparison of borrowing, mean square error, type I error, and power for hierarchical models. As before, orange line represents the separate analyses while the red represents the pooling analyses. Blue curves correspond to IGamma(1, β) priors on τ2 while green curves correspond to IGamma(ε, ε) priors on τ2. Borrowing behavior tends to be ‘flatter’ for hierarchical models, borrowing moderately over a long range, while still displaying dynamic borrowing (borrowing is reduced when the true control rate is far from the historical data). This moderate, long range borrowing is also reflected in the type I error inflation that has a lower slope than other methods (although it still does reach reasonably high values). Generally, the ‘sweet spot’ of improved type I error and higher power extends farther down (for values under 0.65) than other methods. Also note that the green IGamma(ε, ε) choices tend not to inflate type I error as much as the more informative IGamma(1, β) choices.
Figure 9
Figure 9
Type I error and power comparison for separate (orange), pooling (red), selected test-then-pool (size 0.10, purple), downweighted power prior (40% weight, blue), and hierarchical model (IGamma(1, 0.01) in dashed green, and IGamma(0.001, 0.001) in solid green). Generally, the test-then-pool approach has lower type I error and also lower power near a control rate of 0.65, but has reduced power compared to power priors and hierarchical models outside that range. For control rates near 0.65, all methods achieve similar power gains as pooling (red) with much less type I error inflation.

Source: PubMed

3
購読する