Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer

Jakob Wirbel, Paul Theodor Pyl, Ece Kartal, Konrad Zych, Alireza Kashani, Alessio Milanese, Jonas S Fleck, Anita Y Voigt, Albert Palleja, Ruby Ponnudurai, Shinichi Sunagawa, Luis Pedro Coelho, Petra Schrotz-King, Emily Vogtmann, Nina Habermann, Emma Niméus, Andrew M Thomas, Paolo Manghi, Sara Gandini, Davide Serrano, Sayaka Mizutani, Hirotsugu Shiroma, Satoshi Shiba, Tatsuhiro Shibata, Shinichi Yachida, Takuji Yamada, Levi Waldron, Alessio Naccarati, Nicola Segata, Rashmi Sinha, Cornelia M Ulrich, Hermann Brenner, Manimozhiyan Arumugam, Peer Bork, Georg Zeller, Jakob Wirbel, Paul Theodor Pyl, Ece Kartal, Konrad Zych, Alireza Kashani, Alessio Milanese, Jonas S Fleck, Anita Y Voigt, Albert Palleja, Ruby Ponnudurai, Shinichi Sunagawa, Luis Pedro Coelho, Petra Schrotz-King, Emily Vogtmann, Nina Habermann, Emma Niméus, Andrew M Thomas, Paolo Manghi, Sara Gandini, Davide Serrano, Sayaka Mizutani, Hirotsugu Shiroma, Satoshi Shiba, Tatsuhiro Shibata, Shinichi Yachida, Takuji Yamada, Levi Waldron, Alessio Naccarati, Nicola Segata, Rashmi Sinha, Cornelia M Ulrich, Hermann Brenner, Manimozhiyan Arumugam, Peer Bork, Georg Zeller

Abstract

Association studies have linked microbiome alterations with many human diseases. However, they have not always reported consistent results, thereby necessitating cross-study comparisons. Here, a meta-analysis of eight geographically and technically diverse fecal shotgun metagenomic studies of colorectal cancer (CRC, n = 768), which was controlled for several confounders, identified a core set of 29 species significantly enriched in CRC metagenomes (false discovery rate (FDR) < 1 × 10-5). CRC signatures derived from single studies maintained their accuracy in other studies. By training on multiple studies, we improved detection accuracy and disease specificity for CRC. Functional analysis of CRC metagenomes revealed enriched protein and mucin catabolism genes and depleted carbohydrate degradation genes. Moreover, we inferred elevated production of secondary bile acids from CRC metagenomes, suggesting a metabolic link between cancer-associated gut microbes and a fat- and meat-rich diet. Through extensive validations, this meta-analysis firmly establishes globally generalizable, predictive taxonomic and functional microbiome CRC signatures as a basis for future diagnostics.

Conflict of interest statement

Competing Interest

P. Bork, G. Zeller, A.Y. Voigt, and S. Sunagawa are named inventors on a patent (EP2955232A1: Method for diagnosing colorectal cancer based on analyzing the gut microbiome).

Figures

Extended Data Figure 1:. Potential confounder of…
Extended Data Figure 1:. Potential confounder of individual microbial species associations by patient demographics and technical factors
Variance explained by disease status (CRC vs control) is plotted against variance explained by different putative confounding factors for individual microbial species. Each species is represented by a dot proportional in size to its abundance (see legend and Methods); core microbial markers identified in meta-analysis (tested by two-sided blocked Wilcoxon test, n=574 independent observations) are highlighted in red. For the confounder analysis, factors with continuous values were discretized into quartiles and BMI was split into lean/overweight/obese according to conventional cutoffs. The variance explained by disease status was computed all data; accordingly, the x-values are the same in all panels and also in Fig. 1d. Variance explained by different confounding factors was computed using all samples for which data were available (indicated by insets).
Extended Data Figure 2:. Study shows a…
Extended Data Figure 2:. Study shows a strong influence on alpha and beta diversity
(a) Alpha diversity as measured by the Shannon index was computed for all gut microbial species (n=849), reference mOTUs (n=246), and meta mOTUs (n=603) separately. P-values were computed using two-sided Wilcoxon test, while the overall p-value (on top) was calculated using a two-sided blocked Wilcoxon test (n=575 independent observations, see Methods). Anova F statistics below the panel were computed using the R function aov. (b) Principal coordinate analysis of samples from all five included studies based on Bray-Curtis distance; study is color-coded and disease status (CRC vs control) indicated by filled/unfilled circles. The boxplots on the side and below show samples projected onto the first two principle coordinates broken down by study and disease status, respectively. P-values were computed using a two-sided Wilcoxon test for disease status and a Kruskal-Wallis test for study, (n=575 independent observations). For all boxplots, boxes denote the interquartile ranges (IQR) with the median as thick black line and whiskers extending up to the most extreme points within 1.5-fold IQR.
Extended Data Figure 3:. The generalized fold…
Extended Data Figure 3:. The generalized fold change extends the established (median-based) fold change to provide higher resolution in sparse microbiome data
(a) In the top row, the logarithmic relative abundances for Bacteroides dorei/vulgatus, Parvimonas micra, and Fusobacterium nucleatum subsp. animalis -examples for a highly prevalent and two low-prevalence species- are shown as swarmplot for the control (CTR) and colorectal cancer (CRC) groups. The thick vertical lines indicate the medians in the different groups and the black horizontal line shows the difference between the two medians, which corresponds to the classical (median-based) fold change. Since Fusobacterium nucleatum subsp. animalis is not detectable in more than 50% of the cancer cases, there is no difference between the CTR and CRC median and thus the fold change is 0. The lower row shows the same data, but instead of only the median (or 50th percentile), 9 quantiles ranging from 10% to 90% are shown by thinner vertical lines. The generalized fold change is indicated by the horizontal black line again, computed as mean of the differences between the corresponding quantiles in both groups. In the case of the sparse data (e.g. Fusobacterium), the differences in the 70%, 80% and 90% quantiles cause the generalized fold change to be higher than 0. (b) The median fold change is plotted against the newly developed generalized fold change (gFC) for all microbial species (core set of microbial CRC marker species highlighted in orange). Marginal histograms visualize the distribution for both FC and gFC. (c) Scatter plots showing the relationship between FC and gFC and area under the Receiver Operating Characteristics (AUROC) or shift in prevalence between CRC and CTR, with Spearman correlations added in the top-left corners; gFC provides higher resolution (wider distribution around 0) and better correlation with the nonparametric AUROC effect size measure as well as prevalence shift, which captures the difference in prevalence of a species in CRC metagenomes relative to control metagenomes.
Extended Data Figure 4:. Microbial genera identified…
Extended Data Figure 4:. Microbial genera identified in meta-analysis to be associated with CRC
(a) Meta-analysis significance of microbial genera, computed using univariate two-sided Wilcoxon test blocked for study and colonoscopy (n=574 independent observations) is given by bar height (FDR 0.005). Underneath, significance (FDR-corrected p-value computed from two-sided Wilcoxon test) and generalized fold changes (see Methods) within individual studies are displayed as heatmaps in gray and color, respectively (see keys). Genera are ordered by meta-analysis significance and direction of change. (b) For highly significant genera (meta-analysis FDR 1E-05), association strength is quantified by the area under the Receiver Operating Characteristics (AUROC) across individual studies (color-coded diamonds) and 95% confidence interval are depicted by gray lines.
Extended Data Figure 5:. The core set…
Extended Data Figure 5:. The core set of CRC-enriched microbial species can be stratified into four clusters based on co-occurrence in CRC metagenomes
(a) The heatmap shows the Jaccard index (computed by comparing marker-positive samples, see Methods) for the core set of microbial marker species, compute on CRC cases only. Clustering was performed using the Ward algorithm as implemented in the R function hclust. The inset shows the distribution of Jaccard similarities within each cluster and for the background (all similarities between species not in the same cluster, n=841). Boxes denote the interquartile ranges (IQR) with the median as thick black line and whiskers extending up to the most extreme points within 1.5-fold IQR. (b) Barplots show the fraction of CRC samples that are positive for a marker species clusters (defined as the union of positive marker species) broken down by patient subgroups based on BMI and (c) age (see Fig. 2bcd for other patient subgroups). Significance of the associations between CRC subgroups and marker species clusters were tested using the Cochran-Mantel-Haenszel test blocked for study (but no significant associations were detected). (d) For the core set of microbial species with a genomic reference, the presence (red) or absence (white) of superoxide dismutase, peroxidase, and catalase are shown as heatmap (see Methods).
Extended Data Figure 6:. Coefficients of leave-one-study-out…
Extended Data Figure 6:. Coefficients of leave-one-study-out LASSO logistic regression models compared to models trained on individual studies
(a) Mean coefficients (feature weights) from LASSO cross-validation models traind on single studies (color-coded) are plotted against the single feature AUROC for each species feature. Horizontal lines highlight microbial species that are -for at least one study- selected in more than 50% of the models in cross-validation and account for more than 10% of the absolute model weight in at least 10% of the cross-validation models. Similarly, (b) shows the same for models trained in the leave-one-study-out (LOSO) setting (see Methods). Colors indicate which study has been left out of the the training set (and is used for validation). Since the weights of the LOSO models are spread across more species and thus generally lower, species are highlighted by horizontal lines if their weights explain more than 2.5% of the absolute model in at least 10% of cross-validation models and they have been selected in more than 50% of models in cross-validation. (c) Inset shows the distribution of the number of non-zero coefficients across all cross-validation models. (d) Bar height indicates the number of non-zero coefficients that are shared between the mean models for each study or left-out study, respectively. (e) The study-to-study difference (computed as median of all pairwise differences between model weights for a single species across the mean models) for cross-validation (CV) single-study models are plotted against the same measure for the LOSO models. Species with a study-to-study difference of more than 0.02 in the cross-validation models are highlighted and annotated, showing much larger variability between models trained on single studies compared to LOSO models.
Extended Data Figure 7:. Analysis of leave-one-study-out…
Extended Data Figure 7:. Analysis of leave-one-study-out models for prediction bias
(a) To examine whether species and gene-family-level classification models are confounded, i.e. biased towards certain patient subgroups, prediction scores from leave-one-study-out models are plotted broken down into strata for each clinical parameter (e.g. female and male for sex). Prediction bias for each variable was tested by two-sided Wilcoxon (for sex and BMI) or Kruskal-Wallis (all others) tests while blocking for study as confounder (n=575 independent observations). Boxes denote interquartile ranges (IQR) with the median as horizontal black line and whiskers extending up to the most extreme point within 1.5-fold IQR. A significant difference in prediction score was detected only for CRC stage. This stage-bias is more pronounced for gene-family then for species models. (b) To examine CRC stage bias further the barplots show the true positive rate (TPR) corresponding to an overall 10% false positive rate (see also Fig. 3c) for the different CRC stages displaying slightly higher classification sensitivity for late stage CRC for both species and gene-family models.
Extended Data Figure 8:. Cross-study performance of…
Extended Data Figure 8:. Cross-study performance of statistical models based on KEGG KO abundances, single-gene abundances from the metagenomic gene catalogue (IGC), and the combination of taxonomic and eggNOG abundance profiles
CRC classification accuracy resulting from cross validation within each study (gray boxed along diagonal) and study-to-study model transfer (external validations off diagonal) as measured by AUROC for classification models trained on KEGG KO (a), models based on the gene catalogue (b), and models based on the combination of taxonomic and eggNOG abundance profiles (c) (see Methods for details on statistical modeling workflows). The last column depicts the average AUROC across external validations. The barplots on the right show that the classification accuracy on a held-out study improves if data from all other studies are combined for training (leave-one-study-out, LOSO validation) relative to the mean of models trained on data from a single study (study-to-study transfer, n=4, error bars show standard deviation) consistently across different types of input data.
Extended Data Figure 9:. Identification of bai…
Extended Data Figure 9:. Identification of bai genes in metagenomes
Putative bai genes identified in the metagenomic gene catalogue (IGC) were clustered by co-abundance in metagenomes to infer genomic linkage (see Methods) to be able to infer operon completeness and species of origin. (a) For each resulting cluster of putative bile acid converting genes, the mean relative abundance is plotted against the mean percentage of protein identity derived from global alignment against the know bile acid converting genes from C. scindens and C. hylemonae (see Methods). Completeness, i.e. how many of the 11 different bai gene functions are represented in each cluster, and mean gene-to-gene Pearson correlation of log-relative abundance within each cluster are encoded by dot size and color, respectively (see legend). The four clusters with mean protein identity above 75% to know bai operon containing genomes are included in the subsequent analysis and labeled with the most highly correlated mOTU (see (b)). (b) Pearson correlation between gene cluster abundances and most highly correlated species relative abundance (in logarithmic space) is given by bar height for the four gene clusters identified in (a). The most highly correlating species is highlighted in darker grey (see labeling of gene clusters in (a)). (c) The log-transformed abundances of all bai genes and the four species identified in (b) are shown as boxplots for controls (grey) and CRC cases (red). Assessing the significance of differences between CRC and controls (by a two-sided Wilcoxon test blocked for study and colonoscopy, n=574 independent observations) demonstrates a much more significant CRC enrichment of aggregated metagenomic bai gene abundance than of the individual clostridial species to which these belong. Boxes denote the interquartile ranges (IQR) with the median as thick black line and whiskers extending up to the most extreme points within 1.5-fold IQR. (d) Receiver operating characteristic (ROC) curve for the qPCR quantification of the baiF gene in genomic DNA of a subset of samples in the DE study (n=47, see Methods and Fig. 4e) is shown as black line. Shaded grey area depicts the 95% confidence interval.
Extended Data Figure 10:. Validation of meta-analysis…
Extended Data Figure 10:. Validation of meta-analysis single species associations in three independent cohorts
(a) Heatmap showing for the core set of CRC-associated species (see Fig. 1) the rank of the respective species within the associations of each study (tested by two-sided Wilcoxon test), including the three independent validation cohorts (see Table 1), compared to the rank in the meta-analysis (meta, tested by two-sided blocked Wilcoxon test) on the left. (b) Precision-recall curves for the different independent validation cohorts using the meta-analysis set of associated species at FDR 0.005 (n=94, top) and 1E-05 (n=29, bottom) as “true” set (tested by two-sided blocked Wilcoxon test, see Methods) and the naïve (uncorrected) within-cohort significance (tested by two-sided Wilcoxon test) as predictor (see Supplementary Figure X).
Figure 1.. Despite study differences, meta-analysis identifies…
Figure 1.. Despite study differences, meta-analysis identifies a core set of gut microbes strongly associated with CRC.
(a) Meta-analysis significance of gut microbial species derived from blocked Wilcoxon tests (n=574 independent observations) is given by bar height (false discovery rate, FDR, of 0.05). (b) Underneath, species-level significance as computed by two-sided Wilcoxon test (FDR-corrected P-value) and generalized fold change (Methods) within individual studies are displayed as heatmaps in gray and color, respectively (see color bars and Table 1 for details on studies included). Species are ordered by meta-analysis significance and direction of change. (c) For a core of highly significant species (meta-analysis FDR 1E-5), association strength is quantified by the area under the Receiver Operating Characteristics curve (AUROC) across individual studies (color coded diamonds) and 95% confidence intervals are indicated by gray lines. Family-level taxonomic information is color-coded above species names (numbers in brackets are mOTU species identifiers, see Methods). (d) Variance explained by disease status (CRC vs controls) is plotted against variance explained by study effects for individual microbial species with dot size proportional to abundance (Methods); core microbial markers are highlighted in red. F. nucleatumFusobacterium nucleatum.
Figure 2.. Co-occurrence analysis of CRC-associated gut…
Figure 2.. Co-occurrence analysis of CRC-associated gut microbial species reveals four clusters preferentially linked to specific patient subgroups.
(a) The heatmap shows for all CRC patients (n=285 independent samples) if the respective sample is positive for each of the core set of microbial marker species (see Methods for adjustment of positivity threshold). Samples are ordered according to the sum of positive markers and marker species are clustered based on Jaccard similarity of positive samples, resulting in four clusters (Methods). Barplots in (b), (c), and (d) show the fraction of CRC samples that are positive for marker species clusters (defined as the union of positive marker species) broken down by patient subgroups based on differences in tumor location, sex, or CRC stage, respectively. Statistically significant associations between CRC subgroups and marker species clusters were identified using the Cochran–Mantel–Haenszel test blocked for study effects and are indicated above bars (P < 0.1).
Figure 3.. Both taxonomic and functional metagenomic…
Figure 3.. Both taxonomic and functional metagenomic classification models generalize across studies in particular when trained on data from multiple studies.
CRC classification accuracy resulting from cross validation within each study (gray boxes along diagonal) and study-to-study model transfer (external validations off diagonal) as measured by AUROC for classifiers trained on (a) species and (d) eggNOG gene family abundance profiles. The last column depicts the average AUROC across external validations. Classification accuracy, as evaluated by AUROC on a held-out study, improves if taxonomic (b) or functional (e) data from all other studies are combined for training (leave-one-study-out, LOSO validation) relative to models trained on data from a single study (study-to-study transfer, average and standard deviation shown). Bar height for study-to-study transfer corresponds to the average of four classifiers (error bars indicate standard deviation, n=4). (c) Combining training data across studies substantially improves CRC specificity of the (LOSO) classification models relative to models trained on data from a single study (depicted by bar color, as in (c) and (d)) as assessed by the false positive rate (FPR) on fecal samples from patients with other conditions (see legend). Bar height for study-to-study transfer corresponds to the average FPR across classifiers (n=5) with error bars indicating the standard deviation of FPR values observed.
Figure 4.. Meta-analysis identifies consistent functional changes…
Figure 4.. Meta-analysis identifies consistent functional changes in CRC metagenomes.
(a) Meta-analysis significance of gut metabolic modules derived from blocked Wilcoxon tests (n=574 independent samples) is indicated by bar height (top panel, FDR of 0.01). Underneath, the generalized fold change (Methods) for gut metabolic modules [31] within individual studies is displayed as heatmap (see color key below (b)). Metabolic modules are ordered by significance and direction of change. A higher-level classification of the modules is color-coded below the heatmap for the four most common categories (colors as in (b), white indicating other classes). (b) Normalized log abundances for these selected functional categories is compared between controls (CTR) and colorectal cancer cases (CRC). Abundances are summarized as geometric mean of all modules in the respective category and statistical significance determined using blocked Wilcoxon tests (n=574 independent samples, see Methods). (c) Normalized log abundances for virulence factors and toxins compared between metagenomes of controls (CTR) and colorectal cancer cases (CRC) (significant differences P < 0.05 were determined by blocked Wilcoxon test, n=574 independent samples, see Methods for gene identification and quantification in metagenomes; fadA: gene encoding Fusobacterium nucleatum adhesion protein A, bft: gene encoding Bacteroides fragilis enterotoxin, pks: genomic island in Escherichia coli encoding enzymes for the production of genotoxic colibactin, and bai: bile acid inducible operon present in some Clostridiales species encoding bile acid converting enzymes). (d) Meta-analysis significance (uncorrected P-value) as determined by blocked Wilcoxon tests (n=574 independent samples) and generalized fold change within individual studies are displayed as bars and heatmap, respectively, for the genes contained in the bai operon. Due to high sequence similarity to baiF, baiK was not independently detectable with our approach. (e) Metagenomic quantification of baiF (metag. ab. – normalized relative abundance) is plotted against qPCR quantification in genomic DNA (gDNA) extracted from a subset of DE samples (n=47), with Pearson correlation (r) indicated (see Methods). (f) Expression of baiF determined via qPCR on reverse-transcribed RNA from the same samples in contrast to genomic DNA (as in e). The boxplots on the side of (e), (f) show the difference between cancer (CRC) and control (CTR) samples in the respective qPCR quantification (P-values on top were computed using a one-sided Wilcoxon test). All boxplots show interquartile ranges (IQR) as boxes with the median as a black horizontal line and whiskers extending up to the most extreme points within 1.5-fold IQR.
Figure 5.. Meta-analysis results are validated in…
Figure 5.. Meta-analysis results are validated in three independent study populations
CRC classification accuracy for independent datasets, two from Italy and one from Japan (see Supplementary Table S2), is indicated by bar height for single study (white) and leave-one-study-out (grey) models using either (a) species or (b) eggNOG gene family abundance profiles (cf. Fig. 3). Bar height for single study models corresponds to the average of five classifiers (error bars indicate standard deviation, n=5). (c) Normalized log abundances for virulence factors and toxins (cf. Figure 4c) compared between controls (CTR) and colorectal cancer cases (CRC). P-values were determined by blocked, one-sided Wilcoxon tests (n=193 independent samples). Boxes represent interquartile ranges (IQR) with the median as a black horizontal line and whiskers extending up to the most extreme points within 1.5-fold IQR.

Source: PubMed

3
Subscribe