Genomic landscape of non-small cell lung cancer in smokers and never-smokers

Ramaswamy Govindan, Li Ding, Malachi Griffith, Janakiraman Subramanian, Nathan D Dees, Krishna L Kanchi, Christopher A Maher, Robert Fulton, Lucinda Fulton, John Wallis, Ken Chen, Jason Walker, Sandra McDonald, Ron Bose, David Ornitz, Donghai Xiong, Ming You, David J Dooling, Mark Watson, Elaine R Mardis, Richard K Wilson, Ramaswamy Govindan, Li Ding, Malachi Griffith, Janakiraman Subramanian, Nathan D Dees, Krishna L Kanchi, Christopher A Maher, Robert Fulton, Lucinda Fulton, John Wallis, Ken Chen, Jason Walker, Sandra McDonald, Ron Bose, David Ornitz, Donghai Xiong, Ming You, David J Dooling, Mark Watson, Elaine R Mardis, Richard K Wilson

Abstract

We report the results of whole-genome and transcriptome sequencing of tumor and adjacent normal tissue samples from 17 patients with non-small cell lung carcinoma (NSCLC). We identified 3,726 point mutations and more than 90 indels in the coding sequence, with an average mutation frequency more than 10-fold higher in smokers than in never-smokers. Novel alterations in genes involved in chromatin modification and DNA repair pathways were identified, along with DACH1, CFTR, RELN, ABCB5, and HGF. Deep digital sequencing revealed diverse clonality patterns in both never-smokers and smokers. All validated EFGR and KRAS mutations were present in the founder clones, suggesting possible roles in cancer initiation. Analysis revealed 14 fusions, including ROS1 and ALK, as well as novel metabolic enzymes. Cell-cycle and JAK-STAT pathways are significantly altered in lung cancer, along with perturbations in 54 genes that are potentially targetable with currently available drugs.

Copyright © 2012 Elsevier Inc. All rights reserved.

Figures

Figure 1
Figure 1
Mutation landscape in lung cancer. A heatmap of significant genetic events in 17 NSCLC samples is provided for both (A) genes previously implicated lung cancer and (B) novel genes found to be recurrently altered in the present study. Events, including point mutations, truncation mutations, copy number gains and losses, and larger structural variations are color coded according to the legend provided. (C) Clinical characteristics of the 17 NSCLC patients. (D) A stacked bar graph representing the total number of tier 1 mutations in each patient, color-proportioned by the number of synonymous versus non-synonymous mutations. (E) A stacked bar graph representing the frequency of each type of base substitution for all tier 1 point mutations in 17 NSCLC genomes. See also Supplemental Figure S1, Supplemental Data S2 & Data S3 and Supplemental Tables S1, S2, S3, S4, S5, S6, S8, S9, S10, S12, S13, S16 & S17.
Figure 2
Figure 2
Tumor clonality analysis in lung cancer. (A) Schematic depiction of a monoclonal tumor sample with a higher tumor purity (i.e. few normal cells). (B) Schematic depiction of a biclonal tumor sample consisting of a small number of contaminating normal cells, a primary or ‘founder’ clone (pink tumor cells) and a secondary clone (purple tumor cells). The cells of the secondary clone contain the majority of mutations present in the founder clone but have acquired a distinct set of new mutations not shared with the founder. (C) Tumor clonality plot of a monoclonal tumor from a never smoker (LUC11). (D) Tumor clonality plot of a bi-clonal tumor from a never smoker (LUC15) with an EGFR mutation in the founder clone. (E) Tumor clonality plot from a tobacco smoker (LUC9) with two distinct clones. The founder clone has a mean tumor variant allele frequency of 41.1% and the sub clone has a mean tumor variant allele frequency of 20.4%. (F) Tumor clonality plot of a bi-clonal tumor from a tobacco smoker (LUC10) with a KRAS mutation in the founder clone. See also Supplemental Figure S1 and Supplemental Data S3 and Supplemental Table S11, S14 & S15.
Figure 3
Figure 3
Mutant biased expression of KRAS and TP53 somatic variants. (A) A line diagram depicting variant expression categories for heterozygous mutations from a diploid genome. A maternal (a) and paternal (b) allele of chromosome 3 is depicted with four example genes enlarged. Each gene contains a heterozygous mutation on the b allele depicted as a red line. Each gene example illustrates a distinct variant expression pattern by displaying differing numbers of transcripts from each allele being generated from each locus. The ‘FPKM’ is represented as differing numbers of transcripts generated from each locus and the variant allele frequency (VAF) is calculated as the proportion of these transcripts deriving from the mutant allele and containing the variant base. (B) The proportion of variants corresponding to each of the four variant expression categories is summarized for all 17 lung cancers (‘other’ refers to cases where the classification was ambiguous due to marginal sequencing coverage). The total number of variants (‘n’) is provided for each patient and the cases are grouped by smoker status. (C) Box plots are used to display the expression of FPKM expression values for all detected genes in all 17 cases. The expression level of KRAS and TP53 are displayed by colored triangles and circles respectively and patients with a mutation in these genes are indicated in red. (D) The correlation between VAF calculated from WGS and RNA-seq read counts is depicted as a scatter plot for a single patient. The FPKM expression level of the gene harboring each variant is indicated by a yellow-to-red color scale where yellow indicates low gene expression and red indicates high gene expression. KRAS is highlighted as an example of a variant with a VAF that is higher in the RNA than in the WGS data for this patient. (E) The amino acid position of each KRAS and TP53 mutation is depicted relative to the open reading frame of the gene along with the position of known protein domains. See also Supplemental Tables S7, S17, S18 & S19
Figure 4
Figure 4
Analysis of transcription coupled repair across the genome. (A) The mutation rate is assessed independently in each of tiers 1–3 for never-smokers and (B) smokers. Smokers LUC1 and LUC9 are omitted from (B) due to their extremely low and extremely high mutation rates, respectively. Both (A) and (B) clearly show that the coding space in these tumor genomes incur fewer mutations than other regions in the genomes. (C) Genes were binned based on the FPKM values derived from RNA expression analysis of the tumor samples, and then the mutation rate (validated somatic mutations per adequately covered Mbp) was calculated for each expression level bin. The graph shows that the lowest mutation rates occur in the most highly expressed genes. See also Supplemental Data S2 and Supplemental Tables S7, S8 & S9.
Figure 5
Figure 5
Alterations in JAK/STAT pathway and integration of somatic alterations and high RNA expression in significant KEGG pathways. (A) Heat map of significantly over represented gene pathways in lung cancer (p < 0.05). The number of gene members of each KEGG cancer pathway (“KEGG pathways in cancer”, or “hsa05200”) altered by one of four alteration types in at least one patient are summarized as a heat map. The KEGG pathway name is listed on the y-axis at the left and the total number of genes comprising that pathway is provided (labeled as n). The number within each box represents the number of genes altered in at least one patient for each alteration type. The percentage of all gene members of the KEGG pathway altered in at least one patient by at least one alteration type is provided on the right side. The heat map is sorted by this percentage. (B) Molecular alterations in JAK-STAT pathway in patients with non small cell lung cancer. Genes that were found to be altered in the 17 lung cancer samples are labeled with the type of molecular change (E – over-expression, C – copy number alteration, M – mutation and S – structural variation) and the frequency. See also Supplemental Data S3 and Supplemental Tables S3, S9, S11, S18 & S20.
Figure 5
Figure 5
Alterations in JAK/STAT pathway and integration of somatic alterations and high RNA expression in significant KEGG pathways. (A) Heat map of significantly over represented gene pathways in lung cancer (p < 0.05). The number of gene members of each KEGG cancer pathway (“KEGG pathways in cancer”, or “hsa05200”) altered by one of four alteration types in at least one patient are summarized as a heat map. The KEGG pathway name is listed on the y-axis at the left and the total number of genes comprising that pathway is provided (labeled as n). The number within each box represents the number of genes altered in at least one patient for each alteration type. The percentage of all gene members of the KEGG pathway altered in at least one patient by at least one alteration type is provided on the right side. The heat map is sorted by this percentage. (B) Molecular alterations in JAK-STAT pathway in patients with non small cell lung cancer. Genes that were found to be altered in the 17 lung cancer samples are labeled with the type of molecular change (E – over-expression, C – copy number alteration, M – mutation and S – structural variation) and the frequency. See also Supplemental Data S3 and Supplemental Tables S3, S9, S11, S18 & S20.
Figure 6
Figure 6
Potential therapeutic targets in non-small cell lung cancer. (A) Graphical representation of the various therapeutic targets in each patient sample. Patients are listed on the×axis. Target genes identified as altered in one more patients and the drugs that targeted these genes are listed on the y axis (gene symbols on the left side and corresponding drug names on the right side). Where display of all drug names was not practical, the list was abbreviated. The numbers in parentheses indicate the total number of drugs currently available for each gene target. A box representing each gene-drug combination for each patient is colored according to the class of gene alteration: red for SNVs, orange for Indels, purple for CNV amplifications and green for RNA over-expression (Supplementary Methods). Gene targets are grouped and labeled on the left side of the plot according to the therapeutic class of their targeted agents. See also Supplemental Tables S3, S4, S9, S11, S15, S16 & S17.

Source: PubMed

3
Tilaa