A comparative genomic analysis of diverse clonal types of enterotoxigenic Escherichia coli reveals pathovar-specific conservation

Jason W Sahl, Hans Steinsland, Julia C Redman, Samuel V Angiuoli, James P Nataro, Halvor Sommerfelt, David A Rasko, Jason W Sahl, Hans Steinsland, Julia C Redman, Samuel V Angiuoli, James P Nataro, Halvor Sommerfelt, David A Rasko

Abstract

Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal illness in children less than 5 years of age in low- and middle-income nations, whereas it is an emerging enteric pathogen in industrialized nations. Despite being an important cause of diarrhea, little is known about the genomic composition of ETEC. To address this, we sequenced the genomes of five ETEC isolates obtained from children in Guinea-Bissau with diarrhea. These five isolates represent distinct and globally dominant ETEC clonal groups. Comparative genomic analyses utilizing a gene-independent whole-genome alignment method demonstrated that sequenced ETEC strains share approximately 2.7 million bases of genomic sequence. Phylogenetic analysis of this "core genome" confirmed the diverse history of the ETEC pathovar and provides a finer resolution of the E. coli relationships than multilocus sequence typing. No identified genomic regions were conserved exclusively in all ETEC genomes; however, we identified more genomic content conserved among ETEC genomes than among non-ETEC E. coli genomes, suggesting that ETEC isolates share a genomic core. Comparisons of known virulence and of surface-exposed and colonization factor genes across all sequenced ETEC genomes not only identified variability but also indicated that some antigens are restricted to the ETEC pathovar. Overall, the generation of these five genome sequences, in addition to the two previously generated ETEC genomes, highlights the genomic diversity of ETEC. These studies increase our understanding of ETEC evolution, as well as provide insight into virulence factors and conserved proteins, which may be targets for vaccine development.

Figures

FIG. 1.
FIG. 1.
Comparison of the phylogenetic trees using either the seven-gene PubMLST system (A) or a whole-genome alignment (B) of >2.7 Mb of sequence information as determined by Mugsy (3) to be the core genome of E. coli. The pathotype of each E. coli strain is depicted with a symbol described in the legend. The letters on the right of each tree indicate the phylogenetic group. Bootstrap values are greater than 75% (A) or 95% (B), unless stated otherwise. The PubMLST tree (A) was inferred by using the maximum-likelihood method, while the whole-genome tree (B) was inferred with a combination of neighbor-joining and maximum-likelihood methods. Nodes highlighted with an “!” are different in the whole-genome analysis (B) than they are in the PubMLST analysis (A). The whole-genome analysis consolidates a number of the other phylogenetic types that were previously separated on the MLST tree (groups A, B1, and E, as well as the Shigella group). EAEC, enteroaggregative E. coli; EHEC, enterohemorrhagic E. coli; EIEC, enteroinvasive E. coli; EPEC, enteropathogenic E. coli; ETEC, enterotoxigenic E. coli; ExPEC, extraintestinal pathogenic E. coli.
FIG. 2.
FIG. 2.
Diversity of the E. coli genomes calculated on the conserved core in a gene-independent calculation. A box-and-whisker plot of the percent relatedness of genomes from defined phylogenetic groups to the chromosome of ETEC E24377A is shown. The percent relatedness was calculated by the amount of shared sequence in a whole-genome alignment divided by the genome sequence length of E24377A.
FIG. 3.
FIG. 3.
Conservation and variation of the EtpA protein in ETEC. The visualization of a region of the EtpA global alignment performed by Muscle identifies conserved domains in the amino terminus. Numbers at the top indicate the relative peptide position of the alignment. Sequence blocks in black are identical among all five genomes.

Source: PubMed

3
Suscribir