The heatmap shows the species-level correlations with triglycerides in lipoprotein variables at fasting, post-prandial (6 h), and the difference (rise) between the postprandial and fasting concentrations. The 30 species with the highest number of significant associations (FDR ≤ 0.2) are shown. The asterisk indicates a significant correlation between species and metadata variable using a t-test two-sided, corrected with FDR with q
Extended Data Fig. 8. Pairwise partial Spearman…
Extended Data Fig. 8. Pairwise partial Spearman correlations between bacterial gene families and pathway abundances…
Extended Data Fig. 8. Pairwise partial Spearman correlations between bacterial gene families and pathway abundances with clinical and metabolic risk scores, glycaemic and inflammatory measures, and lipoproteins. a, The heatmap shows gene families correlations with the set of metadata presented in Fig. 5a–c reporting the top 2,000 genes selected among those with at least 20% prevalence on their number of significant correlations (q < 0.2). Gene families’ correlations are showing the same clusters as the species-level correlations in Fig. 5a–c. b, The heatmap shows pathway abundances correlations with the set of metadata presented in Fig. 5a–c reporting all the pathways at 20% prevalence (349 in total). Pathway abundances correlations are showing the same cluster structure as the species-level correlations in Fig. 5a–c.
Extended Data Fig. 9. Concordance of Random…
Extended Data Fig. 9. Concordance of Random Forest scores with species-level partial correlations.
Extended Data Fig. 9. Concordance of Random Forest scores with species-level partial correlations. Volcano plots of the scores assigned to each species by Random Forest and their partial correlation, showing an overall concordance between the two independent approaches. We considered the top 5 metadata variables for the six metadata categories: a, Foods, bacon (g) (corr. 0.49), garlic (g) (corr. 0.424), unsalted nuts (g) (0.422), dairy dessert (g) (corr. 0.421), salted nuts (g) (corr. 0.395). b, Food groups, nuts (corr. 0.468), tea and coffee (corr. 0.436), meat (corr. 0.42), legumes (corr. 0.374), vegetables (corr. 0.371). c, Nutrients, lactose (corr. 0.442), niacin (corr. 0.381), maltose (corr. 0.361), sucrose (corr. 0.344), total carbohydrates (corr. 0.324). d, Nutrients normalized by daily energy intake, magnesium (corr. 0.472), starch (corr. 0.436), total carbohydrates (corr. 0.422), non-starch polysaccharides (NSP) (corr. 0.421), lactose (corr. 0.414). e, Dietary patterns, healthy plant percentage (corr. 0.492), healthy PDI (corr. 0.472), hei score (corr. 0.47), HFD (corr. 0.408), total plants percentage (0.388). f, Lipoproteins, M-HDL-L 6 h rise (corr. 0.406), IDL-C 6 h (corr. 0.4), HDL-L 6 h rise (corr. 0.397), XL-HDL-C 0 h (corr. 0.395), Total Cholesterol 4 h rise (corr. 0.391).
Extended Data Fig. 10. Prevotella copri and/or…
Extended Data Fig. 10. Prevotella copri and/or Blastocystis presence are indicators of a more favourable…
Extended Data Fig. 10. Prevotella copri and/or Blastocystis presence are indicators of a more favourable postprandial glucose response to meals. a–c, Differential analysis of visceral fat, HFD and glucose iAUC 2 h after standardised breakfast according to presence-absence of one and both of P. copri and Blastocystis. The analysis reveals that both these species are indicators of reduced visceral fat, good cholesterol and meal-driven increase of glucose. d,e, Differential analysis of C-peptide and triglycerides at different time points according to presence-absence of one and both of P. copri and Blastocystis. The distributions of the concentrations for C-peptide and triglycerides were typically lower when one or both are absent. An asterisk between two box plots represents a significant p-value (p < 0.05) according to the Mann-Whitney U test (two-sided, Supplementary Table 8). Box plots show first and third quartiles (boxes) and the median (middle line), whiskers extends up-to 1.5× the interquartile range. P-values are available in Supplementary Table 8.
Fig. 1:. The PREDICT 1 study associates…
Fig. 1:. The PREDICT 1 study associates gut microbiome structure with habitual diet and blood…
Fig. 1:. The PREDICT 1 study associates gut microbiome structure with habitual diet and blood cardiometabolic markers. (A) The PREDICT 1 study assessed the gut microbiome of 1,098 volunteers from the UK and US via metagenomic sequencing of stool samples. Phenotypic data obtained through in-person assessment, blood/biospecimen collection, and the return of validated study questionnaires queried a range of relevant host/environmental factors including (1) personal characteristics, such as age, BMI, and estimated visceral fat; (2) habitual dietary intake using semi-quantitative food frequency questionnaires (FFQs); (3) fasting; and (4) postprandial cardiometabolic blood and inflammatory markers, total lipid and lipoprotein concentrations, lipoprotein particle sizes, apolipoproteins, derived metabolic risk scores, glycaemic-mediated metabolites, and metabolites related to fatty acid metabolism. (B) Overall microbiome alpha diversity, estimated as the total number of confidently identified microbial species in a given sample (richness), was correlated with HDL-D (positive) and estimated hepatic steatosis (negative). The five strongest positive and negative Spearman’s correlations with q<0.05 are reported for each of the four categories. Top species based on Shannon diversity are reported in Extended Data Fig. 1A and all correlations are reported in Supplementary Table 1.
Fig. 2:. Food quality, regardless of source,…
Fig. 2:. Food quality, regardless of source, is linked to overall and feature-level composition of…
Fig. 2:. Food quality, regardless of source, is linked to overall and feature-level composition of the gut microbiome. (A) Specific components of habitual diet comprising foods, nutrients, and dietary indices are linked to the composition of the gut microbiome with variable strengths as estimated by machine learning regression and classification models. Boxplots report the correlation between the real value of each component and the value predicted by regression models across 100 training/testing folds (Methods). Circles denote median area-under-the-curve (AUC) values across 100 folds for a corresponding binary classifier between the highest and lowest quartiles (Methods). (B) Single Spearman’s correlations adjusted for BMI and age between microbial species and components of habitual diet with asterisks denoting significant associations (FDR q<0.2). The 30 microbial species with the highest number of significant associations across habitual diet categories are reported. All indices of dietary patterns are reported, whereas only food groups and nutrients (energy-adjusted) with at least 7 associations among the top 30 microbial species are reported (NSP: non‐starch polysaccharides). Rows and columns are hierarchically clustered (complete linkage, Euclidean distance). Full heatmaps of foods and unadjusted nutrients are reported in Extended Data Fig. 2, and the full set of correlations is available in Supplementary Table 5. (C) Number of significant positive and negative associations (Spearman’s correlation p<0.2) between foods and taxa categorized by more and less healthy plant-based foods and more and less healthy animal-based foods according to the PDI. Taxa shown are the 20 species with the highest total number of significant associations regardless of category. (D) The association between the gut microbiome and coffee consumption in UK participants is dose-dependent, i.e. stronger when assessing heavy (e.g. >4 cups/d) vs. never drinkers, and was validated in the US cohort when applying the UK model. The reported ROC curves represent the performance of the classifier at varying classification thresholds with respect to the True Positive Rate (i.e. recall) and the False Positive Rate (i.e. precision). (E-F) Among general dietary patterns and indices, the Healthy Food Diversity index (HFD) and the Alternate Mediterranean Diet score (aMED) were validated in the US cohort, thus showing consistency between the two populations on these two important dietary indices. Other validations of the UK model applied to the US cohort are reported in Extended Data Fig. 3.
Fig. 3:. Random forest machine learning models…
Fig. 3:. Random forest machine learning models trained on microbial or functional profiles are capable…
Fig. 3:. Random forest machine learning models trained on microbial or functional profiles are capable of predicting obesity phenotypic markers, even on independent cohorts. (A) Whole-microbiome machine learning models can assess personal factors with RF regression (boxplots and left-side y-axis) using only taxonomic or functional (i.e. pathway) microbiome features. Classification models (circles and right-side y-axis) exceed AUC 0.65 except for waist-to-hip ratio (WHR) and smoking. (B) We observed the highest correlations between the relative abundance of microbial species and age, BMI, and visceral fat. The link between microbial features and visceral fat was of greater effect and more often significant than with traditional BMI. (C) Using several independent datasets we confirmed correlations between single microbial species and BMI with blue points denoting significant associations at p<0.05. (D) The machine learning model for BMI trained on PREDICT 1 data is reproducible in several external datasets (Extended Data Fig. 5), achieving correlations with true values exceeding those obtained in cross-validation of a single given dataset in five of seven cases. When the PREDICT 1 microbiome model is expanded to include other datasets (excluding those ones used for testing, i.e. leave-one-dataset-out/LODO approach) the performance remains comparable, confirming the generalizability of the PREDICT 1 model on obesity-related indicators.
Fig. 4:. Fasting and postprandial cardiometabolic responses…
Fig. 4:. Fasting and postprandial cardiometabolic responses to standardized test meals associated with the microbiome.
Fig. 4:. Fasting and postprandial cardiometabolic responses to standardized test meals associated with the microbiome. (A) The strongest observed links according to correlation of the predicted versus collected measures between the gut microbiome and fasting metabolic blood markers. For measures of lipid concentration in lipoproteins, we report the five strongest correlations only. Indices are grouped in nine distinct categories, and boxplots report the correlation between the prediction of RF regression models trained on microbial taxa or pathway abundances across 100 training/testing folds and stars report regressor performance when trained on the UK cohort and evaluated on the independent US validation cohort (left-side y-axis). Circles denote AUC values for RF classification (right-side y-axis) (B-F) Performance of our microbiome-based ML-model in estimating postprandial absolute levels and postprandial increases in cardiometabolic markers. Stars denote regression model results in our US validation cohort for postprandial measurements (not rises; Extended Data Fig. 4B-C). (B) RF regression and classification performance in predicting postprandial metabolic responses for clinic Meal 1 (breakfast) measured as iAUC at 6h for triglycerides and iAUC at 2h for glucose, C-peptide, and insulin. (C) Glycaemic-mediated postprandial iAUCs at 2h for the other meals (Supplementary Table 7), and (D) glycaemic-mediated markers absolute levels vs. rise. (E) Postprandial inflammatory measures (concentration and rise). (F) Postprandial lipoproteins measures (6h concentration and rise). (G) Overall agreement between RF regression and classification tasks for UK models applied to the independent US cohort. (H) RF microbiome-based model performance with postprandial changes (concentrations and rise) in lipoprotein concentration, composition, and size. Fasting and postprandial performance indices (correlation of the regressors’ outputs) were more tightly linked to gut community structure than were their corresponding postprandial rises.
Fig. 5:. Species-level segregation into healthy and…
Fig. 5:. Species-level segregation into healthy and unhealthy microbial signatures of fasting and postprandial cardiometabolic…
Fig. 5:. Species-level segregation into healthy and unhealthy microbial signatures of fasting and postprandial cardiometabolic markers. (A) Associations (Spearman’s correlation, q<0.2 marked with stars) between single microbial species and fasting clinical risk measures and (B) glycaemic, inflammatory, and lipaemic indices. (C) Correlation between microbial species and the iAUC for glucose and C-peptide estimations based on clinical measurements before and after standardized meals. The 30 species with the highest number of significant correlations with distinct fasting and postprandial indices are shown. Rows are hierarchically clustered (complete linkage, Euclidean distance). (D) Microbe-metabolite correlations are very consistent when evaluated for fasting versus postprandial (6h) conditions (left panel). Associations with postprandial variations (rise) conversely often show opposing relationships, with several species positively correlated with fasting measures being negatively correlated with postprandial variation of the same metabolite (or vice versa, central panel). This was mitigated somewhat when comparing absolute postprandial responses with rise (right panel).
Fig. 6:. The panel of 30 species…
Fig. 6:. The panel of 30 species showing the strongest overall correlations with a selection…
Fig. 6:. The panel of 30 species showing the strongest overall correlations with a selection of markers of nutritional and cardiometabolic health. The 30 species with the highest and lowest average ranks with diverse positive and negative cardiometabolic health and healthy diet indicators, respectively, are shown here. The rank of each microbe’s correlation with individual indicators is written within cells when significant (p
All figures (16)