Precision oncology for acute myeloid leukemia using a knowledge bank approach

Moritz Gerstung, Elli Papaemmanuil, Inigo Martincorena, Lars Bullinger, Verena I Gaidzik, Peter Paschka, Michael Heuser, Felicitas Thol, Niccolo Bolli, Peter Ganly, Arnold Ganser, Ultan McDermott, Konstanze Döhner, Richard F Schlenk, Hartmut Döhner, Peter J Campbell, Moritz Gerstung, Elli Papaemmanuil, Inigo Martincorena, Lars Bullinger, Verena I Gaidzik, Peter Paschka, Michael Heuser, Felicitas Thol, Niccolo Bolli, Peter Ganly, Arnold Ganser, Ultan McDermott, Konstanze Döhner, Richard F Schlenk, Hartmut Döhner, Peter J Campbell

Abstract

Underpinning the vision of precision medicine is the concept that causative mutations in a patient's cancer drive its biology and, by extension, its clinical features and treatment response. However, considerable between-patient heterogeneity in driver mutations complicates evidence-based personalization of cancer care. Here, by reanalyzing data from 1,540 patients with acute myeloid leukemia (AML), we explore how large knowledge banks of matched genomic-clinical data can support clinical decision-making. Inclusive, multistage statistical models accurately predicted likelihoods of remission, relapse and mortality, which were validated using data from independent patients in The Cancer Genome Atlas. Comparison of long-term survival probabilities under different treatments enables therapeutic decision support, which is available in exploratory form online. Personally tailored management decisions could reduce the number of hematopoietic cell transplants in patients with AML by 20-25% while maintaining overall survival rates. Power calculations show that databases require information from thousands of patients for accurate decision support. Knowledge banks facilitate personally tailored therapeutic decisions but require sustainable updating, inclusive cohorts and large sample sizes.

Figures

**Figure 1. Systematic model comparison**
a. Top panel: Concordance C of different model predictions for overall survival. For cross-validation analyses (grey), we generated 100 training and test sets by randomly splitting the full dataset. The distribution of concordance values across the 100 random sets is shown as a box-and-whisker plot. Also shown are point estimates with error bars for predictions evaluated on pre-specified splits of the dataset, where the training set represented 2 of the 3 trials in the study and the test set was the third trial (red, blue, green) or where the training set was the full AMLSG dataset with the test set being the TCGA cohort (purple). Predictions for the multistage model are evaluated 3yrs after diagnosis. Lower panel: Using the 100 random cross-validation splits, each of the 10 classes of predictive model was built on the training set and evaluated on the test set. The 10 models were ranked based on their relative performance on the test set and the ranks across the 100 cross-validation splits aggregated, indicating how often each model scored best (1st) to worst (10th). Time-dependent models include allogeneic hematopoietic stem cell transplants, which is treated as a time-dependent covariate to avoid bias. b. Coefficient of determination R2 for leave-one-out predictions using time-dependent random effects and multistage predictions of the AMLSG cohort, evaluated at each time (x-axis). c. Same as b, evaluated on TCGA data.

**Figure 2. Multistage modeling of patient fate**
a. Multistage model of patient trajectories. The six colored boxes indicate different stages during treatment, with five possible transitions indicated by solid arrows. Numbers in each box indicate the total number of patients that have entered a given stage in during follow-up. b. Sediment plot showing the fraction of patients in a given stage at a given time after diagnosis. The thick black line denotes overall survival, which is the sum of the deaths without complete remission (red), non-relapse mortality (blue) and mortality after relapse (green). c. Schematic overview of multistage regression. The model estimates the log-additive effect of each of 231 prognostic variables on the transition rates for all 5 possible time-dependent transitions shown in (a). Rate changes are modelled by Cox proportional hazards models with random effects. d. Concordance, C, indicates the survival times at 3 years after diagnosis were correctly ranked by the model. Similarly, at three years after diagnosis only 28% of patients were incorrectly predicted to be alive or dead. e. Mosaic plot of predicted 3-year survival across ELN categories. The height of each bar denotes the fraction of patients in each quarter of survival for each ELN group, and the width of each bar is proportional to the percentage of patients in each ELN group. f. Relative importance of risk factors for different transitions. The concordance C, is shown as percentages across the top of the bar chart.

Figure 3. Multistage outcome predictions for 1024… — **Figure 3. Multistage outcome predictions for 1024 patients**
Cross-validated risk predictions and observed statuses for 1024 patients, arranged along a Hilbert curve. This has the property that patients with similar AML subtype and risk constellation are grouped together in the 2-dimensional space (compare Supplementary Figure 1 for constellations of risk factors). For each individual patient, the survival curves predicted by the multistage model are shown, with the competing outcomes colored as in the legend and Figure 2b. What actually happened to the patient is shown as a line across the base of the graph, with a filled circle indicating the patient died, its color indicating the mode of death. Note that there are many patients for whom one color dominates the diagram, indicating that the probability that a particular event occurs is very high. Reassuringly, for such patients the observed outcomes are highly concordant with the cross-validated predictions and occur at frequencies matching the predicted probabilities.

Figure 4. Individualized risk exemplified for 2… — **Figure 4. Individualized risk exemplified for 2 patients**
a. Sediment plot showing predicted multistage probability after remission for patient PD11104a under a management strategy of standard chemotherapy in CR1 with intended allograft after relapse. Predictions shown are based on models where the given patients were excluded for training; the bar at the bottom denotes the observed outcome (as for Figure 3). The patient was alive at the last follow-up 3.5 years after achieving first complete remission. Numbers at the bottom indicate the probabilities of non-relapse death (NRD), post-relapse death (PRD) and being alive after relapse (AAR) at years 1 to 5 from achieving complete remission. b. Multistage probability for PD11104a in the scenario of an allograft in first complete remission. c. Same as a for patient PD8314a. The patient relapsed after 1.2 years and died soon after. d. Same as b for patient PD8314a. Details of these calculations are presented in Supplementary Note, section 3.5.5.8; additional patients shown in Supplementary Figure S2.

Figure 5. Benefit of allograft in CR1… — **Figure 5. Benefit of allograft in CR1 vs after relapse**
a. Predicted three-year absolute mortality reduction by allografts in CR1 over standard chemotherapy in CR1 and allograft after relapse (y-axis). Calculations are based on patients 10%, blue) and low (

**Figure 6. Extrapolations and power calculations**

a.…

**Figure 6. Extrapolations and power calculations**

a. Subsampling the number of patients reveals a steady,…

**Figure 6. Extrapolations and power calculations**
a. Subsampling the number of patients reveals a steady, but saturating increase in prognostic concordance C for a random effects model for overall survival. Error bars show the 95% confidence intervals for the concordance obtained from multiple independent subsamples of the dataset. b. Graph relating the effect size (hazard ratio) of a prognostic variable to the absolute number of patients with the given factor required to reach significance in a random effects model for overall survival (solid line: P < 0.05; dotted P < 0.001). c. Average prediction error between simulated and estimated survival a random effects model for overall survival as a function of survival time (x-axis) and training cohort size (y-axis).

See this image and copyright information in PMC

**Figure 6. Extrapolations and power calculations**
a. Subsampling the number of patients reveals a steady, but saturating increase in prognostic concordance C for a random effects model for overall survival. Error bars show the 95% confidence intervals for the concordance obtained from multiple independent subsamples of the dataset. b. Graph relating the effect size (hazard ratio) of a prognostic variable to the absolute number of patients with the given factor required to reach significance in a random effects model for overall survival (solid line: P < 0.05; dotted P < 0.001). c. Average prediction error between simulated and estimated survival a random effects model for overall survival as a function of survival time (x-axis) and training cohort size (y-axis).

Source: PubMed

Precision oncology for acute myeloid leukemia using a knowledge bank approach

Abstract

Figures

Patrocinadores e Colaboradores

Condições médicas

Intervenções de drogas