A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set

Timothy E Sweeney, Aaditya Shidham, Hector R Wong, Purvesh Khatri, Timothy E Sweeney, Aaditya Shidham, Hector R Wong, Purvesh Khatri

Abstract

Although several dozen studies of gene expression in sepsis have been published, distinguishing sepsis from a sterile systemic inflammatory response syndrome (SIRS) is still largely up to clinical suspicion. We hypothesized that a multicohort analysis of the publicly available sepsis gene expression data sets would yield a robust set of genes for distinguishing patients with sepsis from patients with sterile inflammation. A comprehensive search for gene expression data sets in sepsis identified 27 data sets matching our inclusion criteria. Five data sets (n = 663 samples) compared patients with sterile inflammation (SIRS/trauma) to time-matched patients with infections. We applied our multicohort analysis framework that uses both effect sizes and P values in a leave-one-data set-out fashion to these data sets. We identified 11 genes that were differentially expressed (false discovery rate ≤1%, inter-data set heterogeneity P > 0.01, summary effect size >1.5-fold) across all discovery cohorts with excellent diagnostic power [mean area under the receiver operating characteristic curve (AUC), 0.87; range, 0.7 to 0.98]. We then validated these 11 genes in 15 independent cohorts comparing (i) time-matched infected versus noninfected trauma patients (4 cohorts), (ii) ICU/trauma patients with infections over the clinical time course (3 cohorts), and (iii) healthy subjects versus sepsis patients (8 cohorts). In the discovery Glue Grant cohort, SIRS plus the 11-gene set improved prediction of infection (compared to SIRS alone) with a continuous net reclassification index of 0.90. Overall, multicohort analysis of time-matched cohorts yielded 11 genes that robustly distinguish sterile inflammation from infectious inflammation.

Copyright © 2015, American Association for the Advancement of Science.

Figures

Fig. 1. Labeled PCA comparing sterile SIRS/trauma…
Fig. 1. Labeled PCA comparing sterile SIRS/trauma versus sepsis patients
(A) Sterile SIRS/trauma and sepsis patients appear to be largely separable in the transcriptomic space, with only a minimal non-separable set. (B) The same labeled PCA is shown, with labels updated to reflect patients in recovery from noninfectious SIRS/trauma and patients with hospital-acquired sepsis; the late group (>48 hours after hospital admission) is much harder to separate. n = 1094 combined from 15 studies.
Fig. 2. Two views of the first…
Fig. 2. Two views of the first three principal components of labeled PCA of time-course data sets
Five peripheral whole-blood gene expression data sets were combined and matched for common genes. The genes with the top 100 orthogonality scores were selected via CUR matrix decomposition, and labeled PCA was performed, broken into classes by day. (A and B) The three-dimensional plots of the first three principal components demonstrate that changes by day explain most variance in the data sets, different data sets show similar changes over time, and the changes over time proceed in a nonlinear fashion. Parts (A) and (B) show two different views of the same data; also see movie S1.
Fig. 3. Effect sizes of the 11-gene…
Fig. 3. Effect sizes of the 11-gene set
Forest plots for random effects model estimates of effect size of the positive genes, comparing SIRS/ trauma/ICU to infection/sepsis patients in each of the discovery cohorts.
Fig. 4. Results of the 11-gene set…
Fig. 4. Results of the 11-gene set in the discovery and neutrophils validation data sets
(A) ROC curves shown for separating sterile SIRS/ICU/trauma patients from those with sepsis in the discovery data sets. (B) ROC curves shown for separating trauma patients with infections from time-matched trauma patients without infection in the Glue Grant neutrophils validation data sets. (C and D) Glue Grant buffy coat discovery (C) and neutrophils validation samples (D) after >1 day since injury, showing average infection z score in noninfected patients versus patients within ±24 hours of diagnosis. In both cases, there is a significant effect due to both time and infection status. (E and F) Box plots of infection z score by time since injury for buffy coat discovery (E) and neutrophils validation samples (F): patients never infected are compared to patients >5 days before infection, 5-to-1 days before infection, ±1 day of diagnosis (cases), and 2-to-5 days after infection diagnosis. JT trend test was significant (P < 0.01) for an increasing trend from never infected to ±1 day of infection for each time point after admission.
Fig. 5. No-controls data sets of trauma/ICU…
Fig. 5. No-controls data sets of trauma/ICU patients who develop VAP
These data sets did not include noninfected patients, so they were empiric Bayes co-normalized with time-matched Glue Grant patients. Orange line shows Glue Grant loess curve. (A) EMEXP3001. (B) GSE6377. (C) GSE12838, both neutrophils and whole-blood samples. In all cases, only the first 8 days since admission are shown, and patients are censored >1 day after diagnosis of infection. (D) ROC curves compare patients within ±1 day of diagnosis (blue points in A to C) with time-matched noninfected Glue Grant patients. See Table 5 for further data set details.
Fig. 6. Discrimination of healthy versus sepsis
Fig. 6. Discrimination of healthy versus sepsis
Eight independent validation data sets that met inclusion criteria (peripheral whole blood or neutrophils, sampled within 48 hours of sepsis diagnosis) were tested with the infection z score. (A) Infection z scores for all patients (n = 446) were combined in a single violin plot; error bars show middle quartiles. P values calculated with Wilcoxon rank-sum test. (B) Separate ROC curves for each of the eight data sets discriminating sepsis patients from healthy controls. Mean ROC AUC = 0.98. See Table 6 for further data set details.
Fig. 7. Cell type enrichment analyses
Fig. 7. Cell type enrichment analyses
(A and B) Standardized enrichment scores (z scores, dots) for human immune cell types for both (A) the entire set of 82 genes found to be significant in multicohort analysis and (B) the 11-gene set found after forward search (subset of the 82 genes). Part (B) also shows a box plot of distributions of z scores.

Source: PubMed

3
구독하다