Hepatitis C virus transmission bottlenecks analyzed by deep sequencing

Gary P Wang, Scott A Sherrill-Mix, Kyong-Mi Chang, Chris Quince, Frederic D Bushman, Gary P Wang, Scott A Sherrill-Mix, Kyong-Mi Chang, Chris Quince, Frederic D Bushman

Abstract

Hepatitis C virus (HCV) replication in infected patients produces large and diverse viral populations, which give rise to drug-resistant and immune escape variants. Here, we analyzed HCV populations during transmission and diversification in longitudinal and cross-sectional samples using 454/Roche pyrosequencing, in total analyzing 174,185 sequence reads. To sample diversity, four locations in the HCV genome were analyzed, ranging from high diversity (the envelope hypervariable region 1 [HVR1]) to almost no diversity (the 5' untranslated region [UTR]). For three longitudinal samples for which early time points were available, we found that only 1 to 4 viral variants were present, suggesting that productive infection was initiated by a very small number of HCV particles. Sequence diversity accumulated subsequently, with the 5' UTR showing almost no diversification while the envelope HVR1 showed >100 variants in some subjects. Calculation of the transmission probability for only a single variant, taking into account the measured population structure within patients, confirmed initial infection by one or a few viral particles. These findings provide the most detailed sequence-based analysis of HCV transmission bottlenecks to date. The analytical methods described here are broadly applicable to studies of viral diversity using deep sequencing.

Figures

FIG. 1.
FIG. 1.
HCV genome and characteristics of three subjects studied longitudinally during acute HCV infection. (A) The HCV genome and the positions of amplicons studied. The amplicons are numbered 1 to 4 from left to right, and a letter is used to indicate the direction of sequence determination. For the E1E2 HVR1 (3) and E2 (4) amplicons, two slightly different primers were used in each direction in an effort to maximize the diversity of recovered sequence variants, and these are indicated by the two bars. (B) HCV load and ALT levels for patients 1 to 3 during acute HCV infection. The x axis shows the number of weeks after clinical presentation, which for these patients was close in time to initial infection. Further patient characteristics were as follows: patient 1, injecting drug user, anti-HCV negative on 18 June 2001, first ALT flare (ALT, 677) on 6 July 2001, anti-HCV positive on 11 October 2001; patient 2, possible medical exposure, anti-HCV negative on 16 May 2001, initial ALT flare (ALT, 467) on 9 January 2004, anti-HCV positive on 22 April 2004; patient 3, injecting drug user, anti-HCV negative on 31 January 2006 (slightly abnormal ALT, 73), initial ALT flare (ALT, 640) on 10 April 2006, anti-HCV positive on 11 April 2006.
FIG. 2.
FIG. 2.
Methods for pyrosequence acquisition: DNA barcoding and controlling error using PyroNoise. (A) The DNA barcoding strategy. For each primer pair, an 8-base sequence (barcode) that indexed the patient and time point was included. (B) Controlling 454/Roche pyrosequencing error using PyroNoise. Sequences from a control plasmid amplicon (5′ UTR 1A amplicon) were aligned using the left-hand primer sequence. The x axis shows the base number in the read, and the y axis shows the cumulative number of sequence reads. The top sequence block shows the unprocessed raw pyrosequence reads, which contain considerable homopolymer error. The bottom block shows the improved alignments after application of the PyroNoise algorithm. (C) Comparison of the Shannon Diversity Index for OTUs from plasmid controls made by PyroNoise (x axis) versus DNADist and Dotur (y axis). The correct answer for all amplicons is 0. Default parameters were used for each program. (D) Comparison of the Shannon Diversity Index for OTUs from HCV samples made by PyroNoise (x axis) versus DNADist and Dotur (y axis). The correct answer for all amplicons varies from 0 (ultraconserved HCV regions) to higher values (hypervariable HCV regions). Default parameters were used for each program. Dotur OTUs were generated at 97% identity.
FIG. 3.
FIG. 3.
Sequence variation across the HCV genome. Shown is a summary of sequence data. The x axis shows the position on the HCV genome, and the y axis indicates the cumulative numbers of sequences analyzed. Each DNA base is colored as indicated at the bottom right. Amplicons are numbered as in Fig. 1.
FIG. 4.
FIG. 4.
Longitudinal trends in sequence diversity analyzed using the Shannon Index. Data for each amplicon and read direction are shown separately. Each patient is indicated by a color. Times after diagnosis for the samples are shown on the x axis, and Shannon Diversity Index values are shown on the y axis. The data are shown in tabular form in Table S1 in the supplemental material. Data are shown for PyroNoise-processed amplicons (OTUs formed from one or two sequence reads were retained in this analysis). Patient 1, yellow; patient 2, purple; patient 3, green. All pairwise comparisons between pooled data from each amplicon achieved P values of <0.02 (t test). Analysis using a generalized linear model, comparing the Shannon diversity to amplicon and time data while controlling for the patient, showed that diversity significantly increased with time.
FIG. 5.
FIG. 5.
Analysis of HCV sequence data on phylogenetic trees. (A) Diversification of 5′ UTR amplicons (1A) during HCV replication. The relative abundance of a type is shown by the black shading, where the extent of the black region from left to right within the gray bar indicates the proportion in the population. Trees were generated using neighbor joining. (B) Diversification of HVR1 sequences (3B) during HCV replication. The plasmid control is shown for comparison on each tree.
FIG. 6.
FIG. 6.
Comparison of HCV diversity in samples from HCV-monoinfected and HIV-HCV-coinfected subjects. Shannon Index values were calculated for each subject, and then mean Shannon Index values were compared between monoinfected and coinfected subjects. The error bars show the 95% confidence interval of the difference (so that a lack of overlap of the error bars indicates a statistically significant difference in the means). The asterisk denotes a significant difference (P = 0.02616, Mann-Whitney comparison of means) between the two groups indicated by the horizontal line.
FIG. 7.
FIG. 7.
Collector's curve (rarefaction) analysis of HCV sequences from each subject. Repeated sampling of sequence subsets was used to investigate whether additional sampling would likely yield additional OTUs. The numbers of sequences in sequence subsets are shown on the x axis, and the numbers of OTUs in the subset are shown on the y axis. OTUs supported by only one or two sequence reads were removed prior to rarefaction analysis. The amplicons studied were as follows: UTR (1A) (A), core (2A) (B), E1E2HVR1 (3A) (C), and E2 (4A) (D).
FIG. 8.
FIG. 8.
Calculation of the probability of transmission of a single viral variant given the measured HCV population structures. The x axis shows the number of viruses modeled to be involved in functional HCV transmission. The y axis shows the probability of detecting only one type of viral particle in the recipient in the donor-recipient transmission pair. Each of the subjects studied here was modeled individually as a potential donor and is shown as a single curve. The E1E2 HVR1 amplicon was used for this analysis. For subjects for whom multiple time points were available, the last time point was analyzed. Probabilities were calculated as described in the text. The dashed line indicates a P value of 0.05.

Source: PubMed

3
Abonner