Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals

Francesco Asnicar, Sarah E Berry, Ana M Valdes, Long H Nguyen, Gianmarco Piccinno, David A Drew, Emily Leeming, Rachel Gibson, Caroline Le Roy, Haya Al Khatib, Lucy Francis, Mohsen Mazidi, Olatz Mompeo, Mireia Valles-Colomer, Adrian Tett, Francesco Beghini, Léonard Dubois, Davide Bazzani, Andrew Maltez Thomas, Chloe Mirzayi, Asya Khleborodova, Sehyun Oh, Rachel Hine, Christopher Bonnett, Joan Capdevila, Serge Danzanvilliers, Francesca Giordano, Ludwig Geistlinger, Levi Waldron, Richard Davies, George Hadjigeorgiou, Jonathan Wolf, José M Ordovás, Christopher Gardner, Paul W Franks, Andrew T Chan, Curtis Huttenhower, Tim D Spector, Nicola Segata, Francesco Asnicar, Sarah E Berry, Ana M Valdes, Long H Nguyen, Gianmarco Piccinno, David A Drew, Emily Leeming, Rachel Gibson, Caroline Le Roy, Haya Al Khatib, Lucy Francis, Mohsen Mazidi, Olatz Mompeo, Mireia Valles-Colomer, Adrian Tett, Francesco Beghini, Léonard Dubois, Davide Bazzani, Andrew Maltez Thomas, Chloe Mirzayi, Asya Khleborodova, Sehyun Oh, Rachel Hine, Christopher Bonnett, Joan Capdevila, Serge Danzanvilliers, Francesca Giordano, Ludwig Geistlinger, Levi Waldron, Richard Davies, George Hadjigeorgiou, Jonathan Wolf, José M Ordovás, Christopher Gardner, Paul W Franks, Andrew T Chan, Curtis Huttenhower, Tim D Spector, Nicola Segata

Abstract

The gut microbiome is shaped by diet and influences host metabolism; however, these links are complex and can be unique to each individual. We performed deep metagenomic sequencing of 1,203 gut microbiomes from 1,098 individuals enrolled in the Personalised Responses to Dietary Composition Trial (PREDICT 1) study, whose detailed long-term diet information, as well as hundreds of fasting and same-meal postprandial cardiometabolic blood marker measurements were available. We found many significant associations between microbes and specific nutrients, foods, food groups and general dietary indices, which were driven especially by the presence and diversity of healthy and plant-based foods. Microbial biomarkers of obesity were reproducible across external publicly available cohorts and in agreement with circulating blood metabolites that are indicators of cardiovascular disease risk. While some microbes, such as Prevotella copri and Blastocystis spp., were indicators of favorable postprandial glucose metabolism, overall microbiome composition was predictive for a large panel of cardiometabolic blood markers including fasting and postprandial glycemic, lipemic and inflammatory indices. The panel of intestinal species associated with healthy dietary habits overlapped with those associated with favorable cardiometabolic and postprandial markers, indicating that our large-scale resource can potentially stratify the gut microbiome into generalizable health levels in individuals without clinically manifest disease.

Conflict of interest statement

Conflict of interest statement

TD Spector, SE Berry, AM Valdes, F Asnicar, PW Franks, C Huttenhower, and N Segata, are consultants to Zoe Global Ltd (“Zoe”). J Wolf, G Hadjigeorgiou, R Davies, J Capdevila, C Bonnett, R Hine, L Francis, F Giordano, and S Danzanvilliers are or have been employees of Zoe. Other authors have no conflict of interest to declare.

Figures

Extended Data Fig. 1. Alpha diversity linked…
Extended Data Fig. 1. Alpha diversity linked with personal factors, habitual diet, fasting, and postprandial markers.
a, Microbiome alpha diversity computed using the Shannon index correlated markers from the four categories: personal, habitual diet, fasting, and post-prandial. Reported are the five strongest positive and negative Spearman correlations for each category with p < 0.05. All correlations and p-values available in the Supplementary Table 1. b, Inter-sample microbiome distances (beta-diversity) were substantially lower, that is closer, among samples from the same individuals (two weeks apart) compared to those amongst different individuals. Gut microbial communities in monozygotic twins were slightly more similar than in dizygotic twins (Mann–Whitney U test two-sided p = 0.06), which, in turn, were more similar than unrelated individuals (p < 1e-12), even after adjusting for age (p < 1e-12). c, After excluding twin status (that is non-twin, vs. mono vs. dizygotic twins) from the model, personal factors still accounted for the greatest proportion of variance explained in overall microbial diversity, followed by dietary habits, fasting and postprandial cardiometabolic blood markers (by cumulative stepwise dbRDA). d, Cumulative (left bars) contributions and individual (right bars) contributions for each metadata variable based on Bray-Curtis dissimilarity. Box plots show first and third quartiles (boxes) and the median (middle line), whiskers extends up-to 1.5× the interquartile range.
Extended Data Fig. 2. Species-level correlation with…
Extended Data Fig. 2. Species-level correlation with single foods.
The figure shows the species-level correlations (Spearman) with single food quantities as estimated from the food frequency questionnaires. Only foods with at least 5 significant associations (q-value≤0.2) are displayed. Species are sorted by the number of significant associations, and the top 30 are reported in the figure.
Extended Data Fig. 3. Top foods, food…
Extended Data Fig. 3. Top foods, food groups, nutrients, and dietary patterns validated in the PREDICT 1 US cohort.
The application of the RF regression model trained on the PREDICT 1 UK cohort on the PREDICT 1 US participants, validating the associations with food-related variables found in the PREDICT 1 UK.
Extended Data Fig. 4. Performance for random…
Extended Data Fig. 4. Performance for random Forest regression and classification on microbiome functional potential in predicting fasting measurements, total cholesterol and triglycerides in different lipoproteins.
The figure shows the performance of both RF regression and classification tasks trained on microbiome gene families profiles in predicting (a) the fasting measurements presented in Fig. 4a, sorted as in Fig. 4a. b, Predicting performances of the total cholesterol and (c) of triglycerides in different sizes of lipoproteins. For each lipoprotein, we considered its concentration values at both fasting and postprandial (6 h), and also the difference (rise) between the post-prandial concentration and the fasting one. Box plots show the distribution of the Spearman correlations (left axis) between real and predicted values using RF regression. Box plots show first and third quartiles (boxes) and the median (middle line), whiskers extends up-to 1.5× the interquartile range. Circles show the median AUC (right axis) of RF classification in predicting the bottom quartile of the distribution vs. the top quartile.
Extended Data Fig. 5. Distributions of BMI…
Extended Data Fig. 5. Distributions of BMI in each curatedMetagenomicData dataset.
The figure shows the distributions of BMI values for the datasets available in curatedMetagenomicData. This was used to further select those datasets with a comparable range of values (interquartile range between 3.5 and 7.5) as the one in the PREDICT 1 UK dataset (IQR of 5.5), to be used as validation datasets for the associations found. Box plots show first and third quartiles (boxes) and the median (middle line), whiskers extends up-to 1.5× the interquartile range.
Extended Data Fig. 6. Pairwise partial Spearman…
Extended Data Fig. 6. Pairwise partial Spearman correlations between bacterial species and total lipids and cholesterol in lipoproteins.
a, The heatmap shows the species-level correlations with total lipids in lipoprotein variables at fasting, post-prandial (6 h), and the difference (rise) between the postprandial and fasting concentrations. The 30 species with the highest number of significant associations (FDR ≤ 0.2) are shown. The asterisk indicates a significant correlation between species and metadata variable using a t-test two-sided, corrected with FDR with q < 0.2. b, The heatmap shows the species-level correlations with total cholesterol in lipoprotein variables at fasting, post-prandial (6 h), and the difference (rise) between the postprandial and fasting concentrations. The 30 species with the highest number of significant associations (FDR ≤ 0.2) are shown. The asterisk indicates a significant correlation between species and metadata variable using a t-test two-sided, corrected with FDR with q < 0.2. All correlations, p-values, and q-values are available in the Supplementary Table 6.
Extended Data Fig. 7. Species-level correlations with…
Extended Data Fig. 7. Species-level correlations with triglycerides in lipoproteins.
The heatmap shows the species-level correlations with triglycerides in lipoprotein variables at fasting, post-prandial (6 h), and the difference (rise) between the postprandial and fasting concentrations. The 30 species with the highest number of significant associations (FDR ≤ 0.2) are shown. The asterisk indicates a significant correlation between species and metadata variable using a t-test two-sided, corrected with FDR with q

Extended Data Fig. 8. Pairwise partial Spearman…

Extended Data Fig. 8. Pairwise partial Spearman correlations between bacterial gene families and pathway abundances…

Extended Data Fig. 8. Pairwise partial Spearman correlations between bacterial gene families and pathway abundances with clinical and metabolic risk scores, glycaemic and inflammatory measures, and lipoproteins.
a, The heatmap shows gene families correlations with the set of metadata presented in Fig. 5a–c reporting the top 2,000 genes selected among those with at least 20% prevalence on their number of significant correlations (q < 0.2). Gene families’ correlations are showing the same clusters as the species-level correlations in Fig. 5a–c. b, The heatmap shows pathway abundances correlations with the set of metadata presented in Fig. 5a–c reporting all the pathways at 20% prevalence (349 in total). Pathway abundances correlations are showing the same cluster structure as the species-level correlations in Fig. 5a–c.

Extended Data Fig. 9. Concordance of Random…

Extended Data Fig. 9. Concordance of Random Forest scores with species-level partial correlations.

Volcano plots…

Extended Data Fig. 9. Concordance of Random Forest scores with species-level partial correlations.
Volcano plots of the scores assigned to each species by Random Forest and their partial correlation, showing an overall concordance between the two independent approaches. We considered the top 5 metadata variables for the six metadata categories: a, Foods, bacon (g) (corr. 0.49), garlic (g) (corr. 0.424), unsalted nuts (g) (0.422), dairy dessert (g) (corr. 0.421), salted nuts (g) (corr. 0.395). b, Food groups, nuts (corr. 0.468), tea and coffee (corr. 0.436), meat (corr. 0.42), legumes (corr. 0.374), vegetables (corr. 0.371). c, Nutrients, lactose (corr. 0.442), niacin (corr. 0.381), maltose (corr. 0.361), sucrose (corr. 0.344), total carbohydrates (corr. 0.324). d, Nutrients normalized by daily energy intake, magnesium (corr. 0.472), starch (corr. 0.436), total carbohydrates (corr. 0.422), non-starch polysaccharides (NSP) (corr. 0.421), lactose (corr. 0.414). e, Dietary patterns, healthy plant percentage (corr. 0.492), healthy PDI (corr. 0.472), hei score (corr. 0.47), HFD (corr. 0.408), total plants percentage (0.388). f, Lipoproteins, M-HDL-L 6 h rise (corr. 0.406), IDL-C 6 h (corr. 0.4), HDL-L 6 h rise (corr. 0.397), XL-HDL-C 0 h (corr. 0.395), Total Cholesterol 4 h rise (corr. 0.391).

Extended Data Fig. 10. Prevotella copri and/or…

Extended Data Fig. 10. Prevotella copri and/or Blastocystis presence are indicators of a more favourable…

Extended Data Fig. 10. Prevotella copri and/or Blastocystis presence are indicators of a more favourable postprandial glucose response to meals.
a–c, Differential analysis of visceral fat, HFD and glucose iAUC 2 h after standardised breakfast according to presence-absence of one and both of P. copri and Blastocystis. The analysis reveals that both these species are indicators of reduced visceral fat, good cholesterol and meal-driven increase of glucose. d,e, Differential analysis of C-peptide and triglycerides at different time points according to presence-absence of one and both of P. copri and Blastocystis. The distributions of the concentrations for C-peptide and triglycerides were typically lower when one or both are absent. An asterisk between two box plots represents a significant p-value (p < 0.05) according to the Mann-Whitney U test (two-sided, Supplementary Table 8). Box plots show first and third quartiles (boxes) and the median (middle line), whiskers extends up-to 1.5× the interquartile range. P-values are available in Supplementary Table 8.

Fig. 1:. The PREDICT 1 study associates…

Fig. 1:. The PREDICT 1 study associates gut microbiome structure with habitual diet and blood…

Fig. 1:. The PREDICT 1 study associates gut microbiome structure with habitual diet and blood cardiometabolic markers.
(A) The PREDICT 1 study assessed the gut microbiome of 1,098 volunteers from the UK and US via metagenomic sequencing of stool samples. Phenotypic data obtained through in-person assessment, blood/biospecimen collection, and the return of validated study questionnaires queried a range of relevant host/environmental factors including (1) personal characteristics, such as age, BMI, and estimated visceral fat; (2) habitual dietary intake using semi-quantitative food frequency questionnaires (FFQs); (3) fasting; and (4) postprandial cardiometabolic blood and inflammatory markers, total lipid and lipoprotein concentrations, lipoprotein particle sizes, apolipoproteins, derived metabolic risk scores, glycaemic-mediated metabolites, and metabolites related to fatty acid metabolism. (B) Overall microbiome alpha diversity, estimated as the total number of confidently identified microbial species in a given sample (richness), was correlated with HDL-D (positive) and estimated hepatic steatosis (negative). The five strongest positive and negative Spearman’s correlations with q<0.05 are reported for each of the four categories. Top species based on Shannon diversity are reported in Extended Data Fig. 1A and all correlations are reported in Supplementary Table 1.

Fig. 2:. Food quality, regardless of source,…

Fig. 2:. Food quality, regardless of source, is linked to overall and feature-level composition of…

Fig. 2:. Food quality, regardless of source, is linked to overall and feature-level composition of the gut microbiome.
(A) Specific components of habitual diet comprising foods, nutrients, and dietary indices are linked to the composition of the gut microbiome with variable strengths as estimated by machine learning regression and classification models. Boxplots report the correlation between the real value of each component and the value predicted by regression models across 100 training/testing folds (Methods). Circles denote median area-under-the-curve (AUC) values across 100 folds for a corresponding binary classifier between the highest and lowest quartiles (Methods). (B) Single Spearman’s correlations adjusted for BMI and age between microbial species and components of habitual diet with asterisks denoting significant associations (FDR q<0.2). The 30 microbial species with the highest number of significant associations across habitual diet categories are reported. All indices of dietary patterns are reported, whereas only food groups and nutrients (energy-adjusted) with at least 7 associations among the top 30 microbial species are reported (NSP: non‐starch polysaccharides). Rows and columns are hierarchically clustered (complete linkage, Euclidean distance). Full heatmaps of foods and unadjusted nutrients are reported in Extended Data Fig. 2, and the full set of correlations is available in Supplementary Table 5. (C) Number of significant positive and negative associations (Spearman’s correlation p<0.2) between foods and taxa categorized by more and less healthy plant-based foods and more and less healthy animal-based foods according to the PDI. Taxa shown are the 20 species with the highest total number of significant associations regardless of category. (D) The association between the gut microbiome and coffee consumption in UK participants is dose-dependent, i.e. stronger when assessing heavy (e.g. >4 cups/d) vs. never drinkers, and was validated in the US cohort when applying the UK model. The reported ROC curves represent the performance of the classifier at varying classification thresholds with respect to the True Positive Rate (i.e. recall) and the False Positive Rate (i.e. precision). (E-F) Among general dietary patterns and indices, the Healthy Food Diversity index (HFD) and the Alternate Mediterranean Diet score (aMED) were validated in the US cohort, thus showing consistency between the two populations on these two important dietary indices. Other validations of the UK model applied to the US cohort are reported in Extended Data Fig. 3.

Fig. 3:. Random forest machine learning models…

Fig. 3:. Random forest machine learning models trained on microbial or functional profiles are capable…

Fig. 3:. Random forest machine learning models trained on microbial or functional profiles are capable of predicting obesity phenotypic markers, even on independent cohorts.
(A) Whole-microbiome machine learning models can assess personal factors with RF regression (boxplots and left-side y-axis) using only taxonomic or functional (i.e. pathway) microbiome features. Classification models (circles and right-side y-axis) exceed AUC 0.65 except for waist-to-hip ratio (WHR) and smoking. (B) We observed the highest correlations between the relative abundance of microbial species and age, BMI, and visceral fat. The link between microbial features and visceral fat was of greater effect and more often significant than with traditional BMI. (C) Using several independent datasets we confirmed correlations between single microbial species and BMI with blue points denoting significant associations at p<0.05. (D) The machine learning model for BMI trained on PREDICT 1 data is reproducible in several external datasets (Extended Data Fig. 5), achieving correlations with true values exceeding those obtained in cross-validation of a single given dataset in five of seven cases. When the PREDICT 1 microbiome model is expanded to include other datasets (excluding those ones used for testing, i.e. leave-one-dataset-out/LODO approach) the performance remains comparable, confirming the generalizability of the PREDICT 1 model on obesity-related indicators.

Fig. 4:. Fasting and postprandial cardiometabolic responses…

Fig. 4:. Fasting and postprandial cardiometabolic responses to standardized test meals associated with the microbiome.

Fig. 4:. Fasting and postprandial cardiometabolic responses to standardized test meals associated with the microbiome.
(A) The strongest observed links according to correlation of the predicted versus collected measures between the gut microbiome and fasting metabolic blood markers. For measures of lipid concentration in lipoproteins, we report the five strongest correlations only. Indices are grouped in nine distinct categories, and boxplots report the correlation between the prediction of RF regression models trained on microbial taxa or pathway abundances across 100 training/testing folds and stars report regressor performance when trained on the UK cohort and evaluated on the independent US validation cohort (left-side y-axis). Circles denote AUC values for RF classification (right-side y-axis) (B-F) Performance of our microbiome-based ML-model in estimating postprandial absolute levels and postprandial increases in cardiometabolic markers. Stars denote regression model results in our US validation cohort for postprandial measurements (not rises; Extended Data Fig. 4B-C). (B) RF regression and classification performance in predicting postprandial metabolic responses for clinic Meal 1 (breakfast) measured as iAUC at 6h for triglycerides and iAUC at 2h for glucose, C-peptide, and insulin. (C) Glycaemic-mediated postprandial iAUCs at 2h for the other meals (Supplementary Table 7), and (D) glycaemic-mediated markers absolute levels vs. rise. (E) Postprandial inflammatory measures (concentration and rise). (F) Postprandial lipoproteins measures (6h concentration and rise). (G) Overall agreement between RF regression and classification tasks for UK models applied to the independent US cohort. (H) RF microbiome-based model performance with postprandial changes (concentrations and rise) in lipoprotein concentration, composition, and size. Fasting and postprandial performance indices (correlation of the regressors’ outputs) were more tightly linked to gut community structure than were their corresponding postprandial rises.

Fig. 5:. Species-level segregation into healthy and…

Fig. 5:. Species-level segregation into healthy and unhealthy microbial signatures of fasting and postprandial cardiometabolic…

Fig. 5:. Species-level segregation into healthy and unhealthy microbial signatures of fasting and postprandial cardiometabolic markers.
(A) Associations (Spearman’s correlation, q<0.2 marked with stars) between single microbial species and fasting clinical risk measures and (B) glycaemic, inflammatory, and lipaemic indices. (C) Correlation between microbial species and the iAUC for glucose and C-peptide estimations based on clinical measurements before and after standardized meals. The 30 species with the highest number of significant correlations with distinct fasting and postprandial indices are shown. Rows are hierarchically clustered (complete linkage, Euclidean distance). (D) Microbe-metabolite correlations are very consistent when evaluated for fasting versus postprandial (6h) conditions (left panel). Associations with postprandial variations (rise) conversely often show opposing relationships, with several species positively correlated with fasting measures being negatively correlated with postprandial variation of the same metabolite (or vice versa, central panel). This was mitigated somewhat when comparing absolute postprandial responses with rise (right panel).

Fig. 6:. The panel of 30 species…

Fig. 6:. The panel of 30 species showing the strongest overall correlations with a selection…

Fig. 6:. The panel of 30 species showing the strongest overall correlations with a selection of markers of nutritional and cardiometabolic health.
The 30 species with the highest and lowest average ranks with diverse positive and negative cardiometabolic health and healthy diet indicators, respectively, are shown here. The rank of each microbe’s correlation with individual indicators is written within cells when significant (p
All figures (16)
Comment in
Similar articles
Cited by
Publication types
MeSH terms
Supplementary concepts
[x]
Cite
Copy Download .nbib
Format: AMA APA MLA NLM
Extended Data Fig. 8. Pairwise partial Spearman…
Extended Data Fig. 8. Pairwise partial Spearman correlations between bacterial gene families and pathway abundances with clinical and metabolic risk scores, glycaemic and inflammatory measures, and lipoproteins.
a, The heatmap shows gene families correlations with the set of metadata presented in Fig. 5a–c reporting the top 2,000 genes selected among those with at least 20% prevalence on their number of significant correlations (q < 0.2). Gene families’ correlations are showing the same clusters as the species-level correlations in Fig. 5a–c. b, The heatmap shows pathway abundances correlations with the set of metadata presented in Fig. 5a–c reporting all the pathways at 20% prevalence (349 in total). Pathway abundances correlations are showing the same cluster structure as the species-level correlations in Fig. 5a–c.
Extended Data Fig. 9. Concordance of Random…
Extended Data Fig. 9. Concordance of Random Forest scores with species-level partial correlations.
Volcano plots of the scores assigned to each species by Random Forest and their partial correlation, showing an overall concordance between the two independent approaches. We considered the top 5 metadata variables for the six metadata categories: a, Foods, bacon (g) (corr. 0.49), garlic (g) (corr. 0.424), unsalted nuts (g) (0.422), dairy dessert (g) (corr. 0.421), salted nuts (g) (corr. 0.395). b, Food groups, nuts (corr. 0.468), tea and coffee (corr. 0.436), meat (corr. 0.42), legumes (corr. 0.374), vegetables (corr. 0.371). c, Nutrients, lactose (corr. 0.442), niacin (corr. 0.381), maltose (corr. 0.361), sucrose (corr. 0.344), total carbohydrates (corr. 0.324). d, Nutrients normalized by daily energy intake, magnesium (corr. 0.472), starch (corr. 0.436), total carbohydrates (corr. 0.422), non-starch polysaccharides (NSP) (corr. 0.421), lactose (corr. 0.414). e, Dietary patterns, healthy plant percentage (corr. 0.492), healthy PDI (corr. 0.472), hei score (corr. 0.47), HFD (corr. 0.408), total plants percentage (0.388). f, Lipoproteins, M-HDL-L 6 h rise (corr. 0.406), IDL-C 6 h (corr. 0.4), HDL-L 6 h rise (corr. 0.397), XL-HDL-C 0 h (corr. 0.395), Total Cholesterol 4 h rise (corr. 0.391).
Extended Data Fig. 10. Prevotella copri and/or…
Extended Data Fig. 10. Prevotella copri and/or Blastocystis presence are indicators of a more favourable postprandial glucose response to meals.
a–c, Differential analysis of visceral fat, HFD and glucose iAUC 2 h after standardised breakfast according to presence-absence of one and both of P. copri and Blastocystis. The analysis reveals that both these species are indicators of reduced visceral fat, good cholesterol and meal-driven increase of glucose. d,e, Differential analysis of C-peptide and triglycerides at different time points according to presence-absence of one and both of P. copri and Blastocystis. The distributions of the concentrations for C-peptide and triglycerides were typically lower when one or both are absent. An asterisk between two box plots represents a significant p-value (p < 0.05) according to the Mann-Whitney U test (two-sided, Supplementary Table 8). Box plots show first and third quartiles (boxes) and the median (middle line), whiskers extends up-to 1.5× the interquartile range. P-values are available in Supplementary Table 8.
Fig. 1:. The PREDICT 1 study associates…
Fig. 1:. The PREDICT 1 study associates gut microbiome structure with habitual diet and blood cardiometabolic markers.
(A) The PREDICT 1 study assessed the gut microbiome of 1,098 volunteers from the UK and US via metagenomic sequencing of stool samples. Phenotypic data obtained through in-person assessment, blood/biospecimen collection, and the return of validated study questionnaires queried a range of relevant host/environmental factors including (1) personal characteristics, such as age, BMI, and estimated visceral fat; (2) habitual dietary intake using semi-quantitative food frequency questionnaires (FFQs); (3) fasting; and (4) postprandial cardiometabolic blood and inflammatory markers, total lipid and lipoprotein concentrations, lipoprotein particle sizes, apolipoproteins, derived metabolic risk scores, glycaemic-mediated metabolites, and metabolites related to fatty acid metabolism. (B) Overall microbiome alpha diversity, estimated as the total number of confidently identified microbial species in a given sample (richness), was correlated with HDL-D (positive) and estimated hepatic steatosis (negative). The five strongest positive and negative Spearman’s correlations with q<0.05 are reported for each of the four categories. Top species based on Shannon diversity are reported in Extended Data Fig. 1A and all correlations are reported in Supplementary Table 1.
Fig. 2:. Food quality, regardless of source,…
Fig. 2:. Food quality, regardless of source, is linked to overall and feature-level composition of the gut microbiome.
(A) Specific components of habitual diet comprising foods, nutrients, and dietary indices are linked to the composition of the gut microbiome with variable strengths as estimated by machine learning regression and classification models. Boxplots report the correlation between the real value of each component and the value predicted by regression models across 100 training/testing folds (Methods). Circles denote median area-under-the-curve (AUC) values across 100 folds for a corresponding binary classifier between the highest and lowest quartiles (Methods). (B) Single Spearman’s correlations adjusted for BMI and age between microbial species and components of habitual diet with asterisks denoting significant associations (FDR q<0.2). The 30 microbial species with the highest number of significant associations across habitual diet categories are reported. All indices of dietary patterns are reported, whereas only food groups and nutrients (energy-adjusted) with at least 7 associations among the top 30 microbial species are reported (NSP: non‐starch polysaccharides). Rows and columns are hierarchically clustered (complete linkage, Euclidean distance). Full heatmaps of foods and unadjusted nutrients are reported in Extended Data Fig. 2, and the full set of correlations is available in Supplementary Table 5. (C) Number of significant positive and negative associations (Spearman’s correlation p<0.2) between foods and taxa categorized by more and less healthy plant-based foods and more and less healthy animal-based foods according to the PDI. Taxa shown are the 20 species with the highest total number of significant associations regardless of category. (D) The association between the gut microbiome and coffee consumption in UK participants is dose-dependent, i.e. stronger when assessing heavy (e.g. >4 cups/d) vs. never drinkers, and was validated in the US cohort when applying the UK model. The reported ROC curves represent the performance of the classifier at varying classification thresholds with respect to the True Positive Rate (i.e. recall) and the False Positive Rate (i.e. precision). (E-F) Among general dietary patterns and indices, the Healthy Food Diversity index (HFD) and the Alternate Mediterranean Diet score (aMED) were validated in the US cohort, thus showing consistency between the two populations on these two important dietary indices. Other validations of the UK model applied to the US cohort are reported in Extended Data Fig. 3.
Fig. 3:. Random forest machine learning models…
Fig. 3:. Random forest machine learning models trained on microbial or functional profiles are capable of predicting obesity phenotypic markers, even on independent cohorts.
(A) Whole-microbiome machine learning models can assess personal factors with RF regression (boxplots and left-side y-axis) using only taxonomic or functional (i.e. pathway) microbiome features. Classification models (circles and right-side y-axis) exceed AUC 0.65 except for waist-to-hip ratio (WHR) and smoking. (B) We observed the highest correlations between the relative abundance of microbial species and age, BMI, and visceral fat. The link between microbial features and visceral fat was of greater effect and more often significant than with traditional BMI. (C) Using several independent datasets we confirmed correlations between single microbial species and BMI with blue points denoting significant associations at p<0.05. (D) The machine learning model for BMI trained on PREDICT 1 data is reproducible in several external datasets (Extended Data Fig. 5), achieving correlations with true values exceeding those obtained in cross-validation of a single given dataset in five of seven cases. When the PREDICT 1 microbiome model is expanded to include other datasets (excluding those ones used for testing, i.e. leave-one-dataset-out/LODO approach) the performance remains comparable, confirming the generalizability of the PREDICT 1 model on obesity-related indicators.
Fig. 4:. Fasting and postprandial cardiometabolic responses…
Fig. 4:. Fasting and postprandial cardiometabolic responses to standardized test meals associated with the microbiome.
(A) The strongest observed links according to correlation of the predicted versus collected measures between the gut microbiome and fasting metabolic blood markers. For measures of lipid concentration in lipoproteins, we report the five strongest correlations only. Indices are grouped in nine distinct categories, and boxplots report the correlation between the prediction of RF regression models trained on microbial taxa or pathway abundances across 100 training/testing folds and stars report regressor performance when trained on the UK cohort and evaluated on the independent US validation cohort (left-side y-axis). Circles denote AUC values for RF classification (right-side y-axis) (B-F) Performance of our microbiome-based ML-model in estimating postprandial absolute levels and postprandial increases in cardiometabolic markers. Stars denote regression model results in our US validation cohort for postprandial measurements (not rises; Extended Data Fig. 4B-C). (B) RF regression and classification performance in predicting postprandial metabolic responses for clinic Meal 1 (breakfast) measured as iAUC at 6h for triglycerides and iAUC at 2h for glucose, C-peptide, and insulin. (C) Glycaemic-mediated postprandial iAUCs at 2h for the other meals (Supplementary Table 7), and (D) glycaemic-mediated markers absolute levels vs. rise. (E) Postprandial inflammatory measures (concentration and rise). (F) Postprandial lipoproteins measures (6h concentration and rise). (G) Overall agreement between RF regression and classification tasks for UK models applied to the independent US cohort. (H) RF microbiome-based model performance with postprandial changes (concentrations and rise) in lipoprotein concentration, composition, and size. Fasting and postprandial performance indices (correlation of the regressors’ outputs) were more tightly linked to gut community structure than were their corresponding postprandial rises.
Fig. 5:. Species-level segregation into healthy and…
Fig. 5:. Species-level segregation into healthy and unhealthy microbial signatures of fasting and postprandial cardiometabolic markers.
(A) Associations (Spearman’s correlation, q<0.2 marked with stars) between single microbial species and fasting clinical risk measures and (B) glycaemic, inflammatory, and lipaemic indices. (C) Correlation between microbial species and the iAUC for glucose and C-peptide estimations based on clinical measurements before and after standardized meals. The 30 species with the highest number of significant correlations with distinct fasting and postprandial indices are shown. Rows are hierarchically clustered (complete linkage, Euclidean distance). (D) Microbe-metabolite correlations are very consistent when evaluated for fasting versus postprandial (6h) conditions (left panel). Associations with postprandial variations (rise) conversely often show opposing relationships, with several species positively correlated with fasting measures being negatively correlated with postprandial variation of the same metabolite (or vice versa, central panel). This was mitigated somewhat when comparing absolute postprandial responses with rise (right panel).
Fig. 6:. The panel of 30 species…
Fig. 6:. The panel of 30 species showing the strongest overall correlations with a selection of markers of nutritional and cardiometabolic health.
The 30 species with the highest and lowest average ranks with diverse positive and negative cardiometabolic health and healthy diet indicators, respectively, are shown here. The rank of each microbe’s correlation with individual indicators is written within cells when significant (p
All figures (16)

Source: PubMed

3
Se inscrever