VALENCIA: a nearest centroid classification method for vaginal microbial communities based on composition

Michael T France, Bing Ma, Pawel Gajer, Sarah Brown, Michael S Humphrys, Johanna B Holm, L Elaine Waetjen, Rebecca M Brotman, Jacques Ravel, Michael T France, Bing Ma, Pawel Gajer, Sarah Brown, Michael S Humphrys, Johanna B Holm, L Elaine Waetjen, Rebecca M Brotman, Jacques Ravel

Abstract

Background: Taxonomic profiles of vaginal microbial communities can be sorted into a discrete number of categories termed community state types (CSTs). This approach is advantageous because collapsing a hyper-dimensional taxonomic profile into a single categorical variable enables efforts such as data exploration, epidemiological studies, and statistical modeling. Vaginal communities are typically assigned to CSTs based on the results of hierarchical clustering of the pairwise distances between samples. However, this approach is problematic because it complicates between-study comparisons and because the results are entirely dependent on the particular set of samples that were analyzed. We sought to standardize and advance the assignment of samples to CSTs.

Results: We developed VALENCIA (VAginaL community state typE Nearest CentroId clAssifier), a nearest centroid-based tool which classifies samples based on their similarity to a set of reference centroids. The references were defined using a comprehensive set of 13,160 taxonomic profiles from 1975 women in the USA. This large dataset allowed us to comprehensively identify, define, and characterize vaginal CSTs common to reproductive age women and expand upon the CSTs that had been defined in previous studies. We validated the broad applicability of VALENCIA for the classification of vaginal microbial communities by using it to classify three test datasets which included reproductive age eastern and southern African women, adolescent girls, and a racially/ethnically and geographically diverse sample of postmenopausal women. VALENCIA performed well on all three datasets despite the substantial variations in sequencing strategies and bioinformatics pipelines, indicating its broad application to vaginal microbiota. We further describe the relationships between community characteristics (vaginal pH, Nugent score) and participant demographics (race, age) and the CSTs defined by VALENCIA.

Conclusion: VALENCIA provides a much-needed solution for the robust and reproducible assignment of vaginal community state types. This will allow unbiased analysis of both small and large vaginal microbiota datasets, comparisons between datasets and meta-analyses that combine multiple datasets. Video abstract.

Conflict of interest statement

JR is co-founder of LUCA Biologics, a biotechnology company focusing on translating microbiome research into live biotherapeutics drugs for women’s health. All other authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Heatmap displaying the taxonomic composition of 13,160 vaginal microbial communities using the 25 most abundant phylotypes across all samples. Hierarchical clustering was performed using Bray-Curtis dissimilarity with Ward linkage. Seven community state types were defined, four of which were dominated by a single species of Lactobacillus and three which were not. This dataset was used to train VALENCIA
Fig. 2
Fig. 2
Average relative abundance of twelve key taxa across all of the samples used to define each of the thirteen sub-CSTs. Error bars represent the standard error of the mean as defined using 100 bootstraps of ten percent of the training dataset. These “average” communities define the reference centroids used by VALENCIA to assign new samples to sub-CSTs. Sub-CST IV-C0 is not dominated by any one species. CST V has 20% relative abundance of L. iners in addition to L. jensenii, indicating these two species can co-occur. This relationship is maintained over extended periods of time in some longitudinal profiles
Fig. 3
Fig. 3
Taxonomic composition of all samples (n = 13,160) in the training data set categorized by sub-CST assignment according to VALENCIA (a). Distribution of Shannon diversity index values by sub-CST assignment (b). Shannon diversity was calculated using the log base 2
Fig. 4
Fig. 4
Validation of VALENCIA using three test datasets of vaginal taxonomic profiles derived from sequencing of the 16S rRNA gene. For each dataset, the similarity of each sample to its assigned sub-CST is plotted as a normalized histogram (left, a red, b blue, c green) versus that for the training dataset (dark grey). The taxonomic composition of each sample in the dataset is also provided (right). Test dataset 1 (a) was published by Hickey et al., contained 245 samples, was derived from sequencing of the V1V3 region, and contained samples from adolescent girls. Test dataset 2 (b) contained 1380 samples from menopausal women and was derived from sequencing of the V3V4 region. Test dataset 3 (c) was published by McClelland et al., contained 110 samples from eastern and southern African women, and was derived from sequencing of the V4 region
Fig. 5
Fig. 5
The relationship between each VALENCIA-assigned sub-CST and Nugent score (a) and vaginal pH (b). Nugent score was separated into high (score 8–10), intermediate (score 4–7), and low (score 0–3) categories. Vaginal pH was split into four categories: less than or equal to 4.5, between 4.5 and 5.0, between 5.0 and 5.5, and greater than or equal to 5.5
Fig. 6
Fig. 6
The relationship between the prevalence of each VALENCIA-assigned sub-CST and a woman’s self-identified race (a). Each bar represents the proportion of samples assigned to each CST in women whose race is Asian (n = 95), Black (n = 1,343), Hispanic (n = 110), White (n = 403), or Other (n = 17). For subjects who contributed multiple samples, the within subject relative prevalence of each CST was used in the calculation instead of their individual CST counts. We also examined relationships between the prevalence of each CST and a woman’s age (b). Only the prevalence of CST III was found to have a relationship with age among reproductive-age women. Bars represent the age distribution of subjects whose samples were (orange) or were not (grey) assigned to CST III. Older reproductive age women were less likely to have communities assigned to CST III than younger reproductive age women

References

    1. Plato. 1925. Statesman. Philebus. Ion.trans. Harold N. Fowler, W. R. M. Lamb.
    1. Human Microbiome Project C Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–214. doi: 10.1038/nature11234.
    1. Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019;176(649-662):e20.
    1. Meadow JF, Altrichter AE, Bateman AC, Stenson J, Brown GZ, Green JL, et al. Humans differ in their personal microbial cloud. PeerJ. 2015;3:e1258. doi: 10.7717/peerj.1258.
    1. Costea PI, Hildebrand F, Arumugam M, Backhed F, Blaser MJ, Bushman FD, et al. Enterotypes in the landscape of gut microbial community composition. Nat Microbiol. 2018;3:8–16. doi: 10.1038/s41564-017-0072-8.
    1. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011;473:174–180. doi: 10.1038/nature09944.
    1. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, Mcculle SL, Karlebach S, Gorle R, Russell J, Tacket CO, Brotman RM, Davis CC, Ault K, Peralta L, Forney LJ. 2011. Vaginal microbiome of reproductive-age women., vol 108, p 4680-4687.
    1. Fettweis JM, Serrano MG, Brooks JP, Edwards DJ, Girerd PH, Parikh HI, et al. The vaginal microbiome and preterm birth. Nat Med. 2019;25:1012–1021. doi: 10.1038/s41591-019-0450-2.
    1. Oh J, Byrd AL, Park M, Program NCS, Kong HH, Segre JA. Temporal stability of the human skin microbiome. Cell. 2016;165:854–866. doi: 10.1016/j.cell.2016.04.008.
    1. Segal LN, Clemente JC, Tsay J-CJ, Koralov SB, Keller BC, Wu BG, Li Y, Shen N, Ghedin E, Morris A, Diaz P, Huang L, Wikoff WR, Ubeda C, Artacho A, Rom WN, Sterman DH, Collman RG, Blaser MJ, Weiden MD. 2016. Enrichment of the lung microbiome with oral taxa is associated with lung inflammation of a Th17 phenotype. Nature Microbiology 1.
    1. Belstrom D, Holmstrup P, Bardow A, Kokaras A, Fiehn NE, Paster BJ. Temporal stability of the salivary microbiota in oral health. PLoS One. 2016;11:e0147472. doi: 10.1371/journal.pone.0147472.
    1. Team NIHHMPA A review of 10 years of human microbiome research activities at the US National Institutes of Health, fiscal years 2007-2016. Microbiome. 2019;7:31. doi: 10.1186/s40168-019-0620-y.
    1. Ben-Hur A, Guyon I. Detecting stable clusters using principal component analysis, p 159-182. Totowa, NJ: Humana Press Inc.; 2003.
    1. Levner I. Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics. 2005;6:68. doi: 10.1186/1471-2105-6-68.
    1. Dabney AR. Classification of microarrays to nearest centroids. Bioinformatics. 2005;21:4148–4154. doi: 10.1093/bioinformatics/bti681.
    1. Zhou X, Brown C, Abdo Z, Davis C, Hansmann M, Joyce P, et al. Disparity in the vaginal microbial community composition of healthy Caucasian and black women. ISME J. 2007;1:121–133. doi: 10.1038/ismej.2007.12.
    1. Muzny CA, Blanchard E, Taylor CM, Aaron KJ, Talluri R, Griswold ME, et al. Identification of key bacteria involved in the induction of incident bacterial vaginosis: a prospective study. J Infect Dis. 2018;218:966–978.
    1. Tamarelle J, de Barbeyrac B, Le Hen I, Thiebaut A, Bebear C, Ravel J, et al. Vaginal microbiota composition and association with prevalent chlamydia trachomatis infection: a cross-sectional study of young women attending a STI clinic in France. Sex Transm Infect. 2018;94:616–618. doi: 10.1136/sextrans-2017-053346.
    1. Serrano MG, Parikh HI, Brooks JP, Edwards DJ, Arodz TJ, Edupuganti L, Huang B, Girerd PH, Bokhari YA, Bradley SP, Brooks JL, Dickinson MR, Drake JI, Duckworth RA, 3rd, Fong SS, Glascock AL, Jean S, Jimenez NR, Khoury J, Koparde VN, Lara AM, Lee V, Matveyev AV, Milton SH, Mistry SD, Rozycki SK, Sheth NU, Smirnova E, Vivadelli SC, Wijesooriya NR, Xu J, Xu P, Chaffin DO, Sexton AL, Gravett MG, Rubens CE, Hendricks-Munoz KD, Jefferson KK, Strauss JF, 3rd, Fettweis JM, Buck GA. 2019. Racioethnic diversity in the dynamics of the vaginal microbiome during pregnancy. Nat Med 25:1001-1011.
    1. MacIntyre DA, Chandiramani M, Lee YS, Kindinger L, Smith A, Angelopoulos N, et al. The vaginal microbiome during pregnancy and the postpartum period in a European population. Sci Rep. 2015;5:8988. doi: 10.1038/srep08988.
    1. Gosmann C, Anahtar MN, Handley SA, Huttenhower C, Farcasanu M, Abu-Ali G, Bowen BP, Padavattan N, Desai C, Droit L, Moodley A, Dong M, Chen Y, Ismail N, Ndung'u T, Ghebremichael MS, Wesemann DR, Mitchell CM, Dong KL, Huttenhower C, Walker BD, Virgin HW, Kwon DS. 2017. Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased HIV acquisition in young South African women, vol 46, p 29-37.
    1. Gajer P, Brotman RM, Bai G, Sakamoto J, Schutte UME, Zhong X, et al. Temporal dynamics of the human vaginal microbiota. Sci Transl Med. 2012;4:132ra52. doi: 10.1126/scitranslmed.3003605.
    1. Brotman RM, Ravel J, Cone RA, Zenilman JM. Rapid fluctuation of the vaginal microbiota measured by gram stain analysis. Sex Transm Infect. 2010;86:297–302. doi: 10.1136/sti.2009.040592.
    1. Srinivasan S, Liu C, Mitchell CM, Fiedler TL, Thomas KK, Agnew KJ, et al. Temporal variability of human vaginal bacteria and relationship with bacterial vaginosis. PLoS One. 2010;5:e10197. doi: 10.1371/journal.pone.0010197.
    1. Anahtar MN, Byrne EH, Doherty KE, Bowman BA, Yamamoto HS, Soumillon M, et al. Cervicovaginal bacteria are a major modulator of host inflammatory responses in the female genital tract. Immunity. 2015;42:965–976. doi: 10.1016/j.immuni.2015.04.019.
    1. Brotman RM, Bradford LL, Conrad M, Gajer P, Ault K, Peralta L, et al. Association between Trichomonas vaginalis and vaginal bacterial community composition among reproductive-age women. Sex Transm Dis. 2012;39:807–812. doi: 10.1097/OLQ.0b013e3182631c79.
    1. van Houdt R, Ma B, Bruisten SM, Speksnijder A, Ravel J, de Vries HJC. Lactobacillus iners-dominated vaginal microbiota is associated with increased susceptibility to chlamydia trachomatis infection in Dutch women: a case-control study. Sex Transm Infect. 2018;94:117–123. doi: 10.1136/sextrans-2017-053133.
    1. Brown SE, Schwartz J, Robinson C, O'Hanlon ED, Bradford LL, Xin H, et al. The vaginal microbiota and behavioral factors associated with genital Candida albicans detection in reproductive-age women. Sex Transm Dis. 2019;46:753–758. doi: 10.1097/OLQ.0000000000001066.
    1. Brotman RM, Shardell MD, Gajer P, Fadrosh D, Chang K, Silver MI, et al. Association between the vaginal microbiota, menopause status, and signs of vulvovaginal atrophy. Menopause. 2014;21:450–458. doi: 10.1097/GME.0b013e3182a4690b.
    1. Richard DX, Brown G, Julian DXX, Lee S, Ann DXX, Denise DX, Holly DXX, Lindsay DXX, Tom DXX, Phillip DX, X DX. 2019. Establishment of vaginal microbiota composition in early pregnancy and its association with subsequent preterm prelabor rupture of the fetal membranes. Translational Research doi:10.1016/j.trsl.2018.12.005:in press.
    1. Hickey RJ, Zhou X, Settles ML, Erb J, Malone K, Hansmann MA, Shew ML, Pol BVD, Fortenberry JD, Forney LJ, Forney J. 2015. Vaginal microbiota of adolescent girls prior to the onset of menarche resemble those of reproductive-age women. mBio 6:e00097-15.
    1. McClelland RS, Lingappa JR, Srinivasan S, Kinuthia J, John-Stewart GC, Jaoko W, et al. Evaluation of the association between the concentrations of key vaginal bacteria and the increased risk of HIV acquisition in African women from five cohorts: a nested case-control study. Lancet Infect Dis. 2018;18:554–564. doi: 10.1016/S1473-3099(18)30058-6.
    1. Holm JB, France MT, Ma B, McComb E, Robinson CK, Mehta A, et al. Comparative metagenome-assembled genome analysis of "Candidatus Lachnocurva vaginae", formerly known as bacterial vaginosis-associated Bacterium-1 (BVAB1) Front Cell Infect Microbiol. 2020;10:117. doi: 10.3389/fcimb.2020.00117.
    1. Yue JC, Clayton MK. A similarity measure based on species proportions. Communications in Statistics - Theory and Methods. 2005;34:2123–2131. doi: 10.1080/STA-200066418.
    1. Nugent RP, Krohn MA, Hillier SL. Reliability of diagnosing bacterial vaginosis is improved by standardized method of gram stain interpretation. J Clin Microbiol. 1991;29:297–301. doi: 10.1128/JCM.29.2.297-301.1991.
    1. Srinivasan S, Morgan MT, Liu C, Matsen FA, Hoffman NG, Fiedler TL, et al. More than meets the eye: associations of vaginal bacteria with gram stain morphotypes using molecular phylogenetic analysis. PLoS One. 2013;8:e78633. doi: 10.1371/journal.pone.0078633.
    1. Muzny C, Sunesara IR, Griswold ME, Kumar R, Lefkowitz EJ, Mena LA, et al. Association between BVAB1 and high Nugent scores among women with bacterial vaginosis. Diag Microbiol Infect Dis. 2014;80:321–323. doi: 10.1016/j.diagmicrobio.2014.09.008.
    1. Smith SB, Ravel J. The vaginal microbiota, host defence and reproductive physiology. J Physiol. 2017;595:451–463. doi: 10.1113/JP271694.
    1. O'Hanlon DE, Moench TR, Cone RA. Vaginal pH and microbicidal lactic acid when lactobacilli dominate the microbiota. PLoS One. 2013;8:e80074. doi: 10.1371/journal.pone.0080074.
    1. Boskey ER, Ra C, Whaley KJ, Moench TR. Origins of vaginal acidity: high D/L lactate ratio is consistent with bacteria being the primary source. Human reproduction (Oxford, England) 16:1809-1813. 2001.
    1. Fettweis JM, Brooks JP, Serrano MG, Sheth NU, Girerd PH, Edwards DJ, et al. Differences in vaginal microbiome in African American women versus women of European ancestry. Microbiology. 2014;160:2272–2282. doi: 10.1099/mic.0.081034-0.
    1. Bennett PR, Lee YS, Holmes E, Teoh TG, Kindinger LM, Marchesi JR, et al. The interaction between vaginal microbiota, cervical length, and vaginal progesterone treatment for preterm birth risk. Microbiome. 2017;5.
    1. Elovitz MA, Gajer P, Riis V, Brown AG, Humphrys MS, Holm JB, et al. Cervicovaginal microbiota and local immune response modulate the risk of spontaneous preterm delivery. Nat Commun. 2019;10:1305. doi: 10.1038/s41467-019-09285-9.
    1. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol. 2017;8:1–6. doi: 10.3389/fmicb.2017.02224.
    1. Mandal S, Van Treuren W, White RA, Eggesbo M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015;26:27663.
    1. Callahan BJ, DiGiulio DB, Aliaga Goltsman DS, Sun CL, Costello EK, Jeganathan P, Biggio JR, Wong RJ, Druzin ML, Shaw GM, Stevenson DK, Holmes SP, Relman DA, Contributions A, designed research D, performed research D, contributed new reagents D, tools a, analyzed data D. 2017. Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women. 114.
    1. van de Wijgert J, Verwijs MC, Gill AC, Borgdorff H, van der Veer C, Mayaud P. Pathobionts in the vaginal microbiota: individual participant data meta-analysis of three sequencing studies. Front Cell Infect Microbiol. 2020;10:129. doi: 10.3389/fcimb.2020.00129.
    1. Nizet V. 2018. Group B Streptococcal maternal colonization and neonatal disease: molecular mechanisms and preventative approaches. 6:1-17.
    1. Smith PA, Sherman JM. The lactic acid fermentation of streptococci. J Bacteriol. 1941;43:725–731. doi: 10.1128/JB.43.6.725-731.1942.
    1. Callaghan AO, Sinderen DV. 2016. Bifidobacteria and their role as members of the human gut microbiota. 7.
    1. Freitas AC, Hill JE. Quantification, isolation and characterization of Bifidobacterium from the vaginal microbiomes of reproductive aged women. Anaerobe. 2017;47:145–156. doi: 10.1016/j.anaerobe.2017.05.012.
    1. Gilbert JA, Blaser MJ, Caporaso JG, Jansson JK, Lynch SV, Knight R. Current understanding of the human microbiome. Nat Med. 2018;24:392–400. doi: 10.1038/nm.4517.
    1. Fierer N. Embracing the unknown: disentangling the complexities of the soil microbiome. Nat Rev Microbiol. 2017;15:579–590. doi: 10.1038/nrmicro.2017.87.
    1. Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejo-castillo FM, Costea PI, Cruaud C, Ovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F, Lepoivre C. 2015. Structure and function of the global ocean microbiome. 348:1-10.
    1. Brotman RM, Klebanoff MA, Nansel TR, Yu KF, Andrews WW, Zhang J, et al. Bacterial vaginosis assessed by gram stain and diminished colonization resistance to incident gonococcal, chlamydial, and trichomonal genital infection. J Infect Dis. 2010;202:1907–1915. doi: 10.1086/657320.
    1. Ravel J, Brotman RM, Gajer P, Ma B, Nandy M, Fadrosh DW, Sakamoto J, Koenig SSK, Fu L, Zhou X, Hickey RJ, Schwebke JR, Forney LJ. 2013. Daily temporal dynamics of vaginal microbiota before, during and after episodes of bacterial vaginosis.1-6.
    1. Tamarelle J, Ma B, Gajer P, Humphrys MS, Terplan M, Mark KS, et al. Nonoptimal vaginal microbiota after azithromycin treatment for chlamydia trachomatis infection. J Infect Dis. 2020;221:627–635. doi: 10.1093/infdis/jiz499.
    1. Tuddenham S, Ghanem KG, Caulfield LE, Rovner AJ, Robinson C, Shivakoti R, et al. Associations between dietary micronutrient intake and molecular-bacterial vaginosis. Reprod Health. 2019;16:151. doi: 10.1186/s12978-019-0814-6.
    1. Brotman RM, Klebanoff MA, Nansel TR, Andrews WW, Schwebke JR, Zhang J, et al. A longitudinal study of vaginal douching and bacterial vaginosis--a marginal structural modeling analysis. Am J Epidemiol. 2008;168:188–196. doi: 10.1093/aje/kwn103.
    1. Holm JB, Humphrys MS, Robinson CK, Settles ML, Ott S, Fu L, Yang H, Gajer P, He X, McComb E, Gravitt PE, Ghanem KG, Brotman RM, Ravel J. 2019. Ultrahigh-throughput multiplexing and sequencing of >500-base-pair amplicon regions on the Illumina HiSeq 2500 platform. mSystems 4.
    1. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13:581–583. doi: 10.1038/nmeth.3869.
    1. Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–5267. doi: 10.1128/AEM.00062-07.
    1. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219.
    1. McKinney W. Data structures for statistical computing in python. 2010.
    1. Greathouse KL, Sinha R, Vogtmann E. DNA extraction for human microbiome studies: the issue of standardization. Genome Biol. 2019;20:212. doi: 10.1186/s13059-019-1843-8.
    1. McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. Elife. 2019;8.
    1. Bates D, Maechler M. Package 'lme4'. 2010.
    1. Team RC. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013.
    1. Bolker B. 2019. Package “broom.mixed”.

Source: PubMed

3
Abonner