Deciphering human immunodeficiency virus type 1 transmission and early envelope diversification by single-genome amplification and sequencing

Jesus F Salazar-Gonzalez, Elizabeth Bailes, Kimmy T Pham, Maria G Salazar, M Brad Guffey, Brandon F Keele, Cynthia A Derdeyn, Paul Farmer, Eric Hunter, Susan Allen, Olivier Manigart, Joseph Mulenga, Jeffrey A Anderson, Ronald Swanstrom, Barton F Haynes, Gayathri S Athreya, Bette T M Korber, Paul M Sharp, George M Shaw, Beatrice H Hahn, Jesus F Salazar-Gonzalez, Elizabeth Bailes, Kimmy T Pham, Maria G Salazar, M Brad Guffey, Brandon F Keele, Cynthia A Derdeyn, Paul Farmer, Eric Hunter, Susan Allen, Olivier Manigart, Joseph Mulenga, Jeffrey A Anderson, Ronald Swanstrom, Barton F Haynes, Gayathri S Athreya, Bette T M Korber, Paul M Sharp, George M Shaw, Beatrice H Hahn

Abstract

Accurate identification of the transmitted virus and sequences evolving from it could be instrumental in elucidating the transmission of human immunodeficiency virus type 1 (HIV-1) and in developing vaccines, drugs, or microbicides to prevent infection. Here we describe an experimental approach to analyze HIV-1 env genes as intact genetic units amplified from plasma virion RNA by single-genome amplification (SGA), followed by direct sequencing of uncloned DNA amplicons. We show that this strategy precludes in vitro artifacts caused by Taq-induced nucleotide substitutions and template switching, provides an accurate representation of the env quasispecies in vivo, and has an overall error rate (including nucleotide misincorporation, insertion, and deletion) of less than 8 x 10(-5). Applying this method to the analysis of virus in plasma from 12 Zambian subjects from whom samples were obtained within 3 months of seroconversion, we show that transmitted or early founder viruses can be identified and that molecular pathways and rates of early env diversification can be defined. Specifically, we show that 8 of the 12 subjects were each infected by a single virus, while 4 others acquired more than one virus; that the rate of virus evolution in one subject during an 80-day period spanning seroconversion was 1.7 x 10(-5) substitutions per site per day; and that evidence of strong immunologic selection can be seen in Env and overlapping Rev sequences based on nonrandom accumulation of nonsynonymous mutations. We also compared the results of the SGA approach with those of more-conventional bulk PCR amplification methods performed on the same patient samples and found that the latter is associated with excessive rates of Taq-induced recombination, nucleotide misincorporation, template resampling, and cloning bias. These findings indicate that HIV-1 env genes, other viral genes, and even full-length viral genomes responsible for productive clinical infection can be identified by SGA analysis of plasma virus sampled at intervals typical in large-scale vaccine trials and that pathways of viral diversification and immune escape can be determined accurately.

Figures

FIG. 1.
FIG. 1.
Laboratory staging of acute and early HIV-1 infections. (A) Temporal appearance of HIV-1-specific laboratory markers following HIV-1 infection according to the classification system of Fiebig et al. (8). The eclipse phase is defined by the interval between transmission and first detection of vRNA in the plasma and generally lasts about 10 days, with a range of approximately 7 to 21 days (3, 10, 17-19, 40). The mean durations of Fiebig stages I (7 days), II (5 days), III (3 days), IV (6 days), and V/VI (70+ days) are indicated. (B) Time points (x axis) at which plasma samples were obtained for each of the 12 study subjects (y axis). Because subjects were studied at intervals of ∼3 months, the symbols are positioned to represent the maximum possible number of days from transmission.
FIG. 2.
FIG. 2.
Env quasispecies complexity in 12 primary infection subjects from Zambia. A neighbor-joining tree of SGA-derived full-length env sequences is shown. Brackets encompass sequences from each study subject, as indicated. Asterisks at nodes indicate 90% or higher bootstrap values (shown only on branches whose length exceeds 0.003 substitutions per site). The scale bar represents 0.01 nucleotide substitutions per site. 98ZA502, DU151, DU422, TV012, TV002, SK144B1, and BWMC168 represent subtype C reference sequences. ZM231F falls outside all known group M subtypes and thus remains unclassified.
FIG. 3.
FIG. 3.
Identification of transmitted or early founder env genes in subjects with acute HIV-1 infection. SGA-derived env sequences from pre- and postseroconversion plasma samples from subjects ZM249M (A), ZM247F (B), and ZM246F (C) were examined by phylogenetic tree construction (left panels) and Highlighter analysis (right panels). Sequences from the earlier time points are in bold face. Trees are midpoint rooted. The scale bars represent 1 (A and C) or 10 (B) nt substitutions per site. The corresponding Highlighter diagrams denote the locations of nucleotide sequence substitutions in each env sequence in comparison to a reference sequence listed at the top; the positions of these substitutions within the env sequence are indicated at the bottom. Nucleotide substitutions and gaps are color coded. For each subject, sets of identical and nearly identical env sequences form consensus sequences that coalesce to transmitted or early founder viral env sequences. Subjects ZM249M and ZM246F were infected by a single virus and ZM247F by two viruses, identified as variant 1 and variant 2 in the tree (panel B). The tick in parentheses in sequence 080503_A5 (panel A) indicates a single-nucleotide insertion. The boxed region in panel C indicates clustered mutations in V3. Interestingly, these changes were limited to position 8 (Ile to Thr, Arg, or Lys) and position 25 (Asp to Asn, Glu, or Gly).
FIG. 3.
FIG. 3.
Identification of transmitted or early founder env genes in subjects with acute HIV-1 infection. SGA-derived env sequences from pre- and postseroconversion plasma samples from subjects ZM249M (A), ZM247F (B), and ZM246F (C) were examined by phylogenetic tree construction (left panels) and Highlighter analysis (right panels). Sequences from the earlier time points are in bold face. Trees are midpoint rooted. The scale bars represent 1 (A and C) or 10 (B) nt substitutions per site. The corresponding Highlighter diagrams denote the locations of nucleotide sequence substitutions in each env sequence in comparison to a reference sequence listed at the top; the positions of these substitutions within the env sequence are indicated at the bottom. Nucleotide substitutions and gaps are color coded. For each subject, sets of identical and nearly identical env sequences form consensus sequences that coalesce to transmitted or early founder viral env sequences. Subjects ZM249M and ZM246F were infected by a single virus and ZM247F by two viruses, identified as variant 1 and variant 2 in the tree (panel B). The tick in parentheses in sequence 080503_A5 (panel A) indicates a single-nucleotide insertion. The boxed region in panel C indicates clustered mutations in V3. Interestingly, these changes were limited to position 8 (Ile to Thr, Arg, or Lys) and position 25 (Asp to Asn, Glu, or Gly).
FIG. 4.
FIG. 4.
Evidence of immune selection in two subjects sampled at later Fiebig stages. (A) Highlighter analysis of SGA-derived env sequences (left panel) from subject ZM180M (Fiebig stage VI), and corresponding nucleotide (middle panel) and amino acid (right panel) sequence alignments from the overlapping rev gene. Boxes indicate mutations that cluster within a 27-nt region that corresponds to a 9-mer in the Rev protein sequence. (B) Highlighter analysis of SGA-derived env sequences (left panel) from subject ZM206F (Fiebig stage VI), and corresponding nucleotide (middle panel) and amino acid (right panel) sequence alignments of the V1 region. Boxes indicate mutations that cluster within a 27-nt region that corresponds to a 9-mer of the V1 loop. In Highlighter analyses, nucleotide substitutions and gaps are color coded. The consensus sequence is at the top of each alignment; lowercase letters show residues where mutations occurred. Dashes in the alignments indicate sequence identity to the consensus; dots indicate deletions.
FIG. 5.
FIG. 5.
In vivo recombination in multiply-infected subjects. (A) Neighbor-joining tree of SGA-derived env sequences from subject ZM229M depicting two major transmitted variants (orange and purple), as well as five in vivo-generated recombinants (black). Asterisks at nodes indicate 90% or higher bootstrap values. The scale bars represent 0.005 nucleotide substitutions per site. (B-C) Diversity plots of two representative recombinants identified in panel A. The sequence distances of ZM229M_C15 (panel B) and ZM229M_D1 (panel C) are compared to those of representatives of the two parental lineages (orange and purple, respectively). The two recombinants contain between one and seven crossovers; schematic representations of their putative mosaic structures are shown below. (D) Neighbor-joining tree of SGA-derived env sequences from subject ZM215F depicting two major transmitted variants (orange and purple), as well as 19 different recombinants (black). Asterisks and scale bar are as described for panel A. (E-F) Diversity plots of two ZM215F recombinants containing “extraneous” sequences. The sequence distances of ZM215F_D4 (panel B) and ZM215F_F17 (panel C) are compared to those of representatives of the two parental lineages (orange and purple, respectively). Shaded areas indicate regions where ZM215F_D4 and ZM215F_F17 are equidistant from the two parental lineages, suggesting recombination with additional variants. Schematic representations of their putative mosaic structures are shown below.
FIG. 6.
FIG. 6.
Bulk PCR-induced in vitro recombination in a multiply-infected individual. (A) Neighbor-joining tree of SGA-derived env sequences derived from subject ZM247F (sample obtained 1 November 2003) depicting two major transmitted variants (green and blue; sequences of the more-abundant variant are in green) and no recombinants. (B) Neighbor-joining tree of bulk PCR-derived env sequences amplified from the same specimen depicting two transmitted variants (green and blue; sequences of the more-abundant variant are in green) and eight recombinants (red). Asterisks at nodes indicate 90% or higher bootstrap values. The scale bars show 0.002 substitutions per site. (C and D) Diversity plots of two representative recombinants in panel B. The two recombinants contain one or more crossovers, the approximate positions of which are indicated by nucleotide position (x axis) and shown schematically below the panels.
FIG. 7.
FIG. 7.
Bulk PCR-induced in vitro recombination in a mixture of two plasma samples. (A) Neighbor-joining tree of SGA-derived env sequences derived from a mixture of plasma from two infected subjects, ZM246F (green) and ZM246M (blue). Subject ZM246M was chronically infected with at least two major viral lineages (dark and light blue) that differed in their env sequences by approximately 6%. ZM246F was acutely infected with a virus from an unrelated individual which differed from the ZM246M env sequences by approximately 10%. (B) Neighbor-joining tree of bulk PCR-amplified env sequences from the same mixed-plasma specimen. In addition to viral lineages representing ZM246F (green) and ZM246M (blue), two additional recombinants (ZM246F/M_BULK_41 and ZM246F/M_BULK_60) are apparent (red). Asterisks at nodes indicate 90% or higher bootstrap values. The scale bars show 0.01 nucleotide substitutions per site. (C and D) Diversity plots of the two recombinants in panel B. The approximate position of recombination crossovers is indicated by nucleotide position (x axis) and schematically shown below the panels.

Source: PubMed

3
Tilaa