Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing

Peter J Campbell, Philip J Stephens, Erin D Pleasance, Sarah O'Meara, Heng Li, Thomas Santarius, Lucy A Stebbings, Catherine Leroy, Sarah Edkins, Claire Hardy, Jon W Teague, Andrew Menzies, Ian Goodhead, Daniel J Turner, Christopher M Clee, Michael A Quail, Antony Cox, Clive Brown, Richard Durbin, Matthew E Hurles, Paul A W Edwards, Graham R Bignell, Michael R Stratton, P Andrew Futreal, Peter J Campbell, Philip J Stephens, Erin D Pleasance, Sarah O'Meara, Heng Li, Thomas Santarius, Lucy A Stebbings, Catherine Leroy, Sarah Edkins, Claire Hardy, Jon W Teague, Andrew Menzies, Ian Goodhead, Daniel J Turner, Christopher M Clee, Michael A Quail, Antony Cox, Clive Brown, Richard Durbin, Matthew E Hurles, Paul A W Edwards, Graham R Bignell, Michael R Stratton, P Andrew Futreal

Abstract

Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.

Figures

Figure 1
Figure 1
Experimental protocol and outcome of sequencing. (a) Genomic DNA was sheared by nebulization and primers ligated to either end of the DNA fragments (i). These were size selected on an agarose gel (ii) and sequenced from either end on a GenomeAnalyzer instrument (iii). We mapped the short sequence reads to the genome and determined spacing and whether or not the two ends aligned in the correct orientation (iv). (b) The final disposition of reads for NCI-H2171. (c) Distribution of fragment size for correctly mapping reads from NCI-H2171.
Figure 2
Figure 2
Genome-wide acquired rearrangements. (a) NCI-H2171. (b) NCI-H1770. The outer ring shows a representation of the normal karyotype (red indicates centromeres). The blue line in the middle ring indicates copy number as determined by short read data. The inner circle shows the two endpoints of each somatic rearrangement identified, joined by green lines. Very small rearrangements appear as single lines.
Figure 3
Figure 3
Rearrangements in NCI-H2171. (a) Mapping to the base-pair level of an acquired insertion of sequence from a homogeneously staining region on chromosome 12p13 into chromosome 2q22. The insertion was evident on spectral karyotype, and a paired-end read spanned the breakpoint. Confirmatory PCR and sequencing showed a small 127-bp fragment (shard) from chromosome 2 inverted at the breakpoint. FISH using a probe (RP11-444J21) from the chromosome 12p13 amplicon adjacent to the breakpoint (green) and a probe (RP11-58C7) from the chromosome 2q22 region (red) generated a fusion signal (yellow), confirming that the breakpoint identified corresponded to that seen on the spectral karyotype. (b) A CACNA2D4-WDR43 fusion gene. The 5′ portion of the CACNA2D4 gene is amplified, and the paired-end reads showed a rearrangement that breaks the gene in exon 36, fusing it into intron 3 of WDR43. The sequence at the breakpoint creates an almost perfect splice donor site, resulting in the production of a fusion transcript with a shortened exon 36 from CACNA2D4. (c) An acquired tandem duplication resulting in an aberrant mRNA transcript. A 700-Mb region of increased copy number on chromosome 4 was identified on the copy number analysis. A single paired-end read mapped with one end at the 5′ border of the amplification and the other end at the 3′ border. The breakpoints fell within introns 2–10 of GRID2 and would be expected to give rise to a partial tandem duplication of exons 3–10 of the gene, confirmed by RT-PCR. (d) An inverted duplication of a gene-rich region of chromosome 17 was identified by a localized increase in copy number together with two paired-end reads spanning both inverted breakpoints.
Figure 4
Figure 4
Copy number. Comparison of copy number plots for chromosome 11 of NCI-H2171 between massively parallel paired-end sequencing (upper panel) and Affymetrix SNP6 genomic array data (lower panel).
Figure 5
Figure 5
Amplicons in NCI-H2171 and NCI-H1770. (a) The MYC amplicon of NCI-H2171 involves two regions of chromosome 8q, both of which show extensive variation in copy number. Breakpoints between and within the two regions are shown in blue lines, with the thickness of the line proportional to the number of paired-end reads spanning the same breakpoint. The locations of the breakpoints frequently correspond to changes in copy number. (b) A PVT1-CHD7 fusion gene is created by the most commonly seen breakpoint in the MYC amplicon. Extensive splicing of the PVT1 moiety was seen on RT-PCR, but the three most common transcripts accounted for 23 of 30 of colonies sequenced. (c) The NMYC amplicon of NCI-H1770 showed up to 85-fold amplification of a 2-Mb region. The most common breakpoint observed demarcated the 5′ and 3′ borders of the amplicon and suggested tandem insertion of the 2-Mb region, but several other rearrangements were seen within the amplicon, both inverted (arcs above the line) and noninverted (arcs below the line).

Source: PubMed

3
Sottoscrivi