Association analysis identifies 65 new breast cancer risk loci

Abstract

Breast cancer risk is influenced by rare coding variants in susceptibility genes, such as BRCA1, and many common, mostly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. Here we report the results of a genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry. We identified 65 new loci that are associated with overall breast cancer risk at P < 5 × 10-8. The majority of credible risk single-nucleotide polymorphisms in these loci fall in distal regulatory elements, and by integrating in silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2-5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the use of genetic risk scores for individualized screening and prevention.

Conflict of interest statement

The authors confirm that they have no competing financial interests.

Figures

Extended data Figure 1. Global mapping of…
Extended data Figure 1. Global mapping of biofeatures across novel loci associated with overall breast cancer risk.
The overlaps between potential genomic predictors in relevant breast cell lines and candidate causal risk variants (CRVs) within each locus. On the x-axis, each column represents a CRV (see Online Methods). The most significant SNPs are identified in each region. On the y-axis, biofeatures are grouped into five functional categories: genomic structure (red), enhancer marks (dark green), histone marks (blue), open chromatin marks (dark blue) and transcription factor binding sites (dark violet). Colored elements indicate SNPs for which the feature is present. For data sources, see Online Methods (“In-Silico Analysis of CRVs”).
Extended data Figure 2. Pathway enrichment map…
Extended data Figure 2. Pathway enrichment map for susceptibility loci based on summary association statistics.
Each circle (node) represents a pathway (gene set), coloured by enrichment score (ES) where redder nodes indicate lower FDRs. Larger nodes indicate pathways with more genes. Green lines connect pathways with overlapping genes (minimum overlap 0.55). Pathways are grouped by similarity and organized into major themes (large labelled circles).
Extended data Figure 3. Heatmap showing patterns…
Extended data Figure 3. Heatmap showing patterns of cell type-specific enrichments for breast tissue across three histone marks (H3K4me1, H3K4me3 and H3K9ac) for breast cancer overall, ER-positive breast cancer and ER-negative breast cancer as well as 16 other traits.
Extended data Figure 4. Heatmap showing patterns…
Extended data Figure 4. Heatmap showing patterns of cell type-specific enrichments for histone mark H3K27ac in breast cancer overall, ER+ and ER- breast cancer as well as 16 other traits.
Extended data Figure 5. Heatmap showing patterns…
Extended data Figure 5. Heatmap showing patterns of cell type-specific enrichments for histone mark H3K4me1 in breast cancer overall, ER+ and ER- breast cancer as well as 16 other traits.
Extended data Figure 6. Heatmap showing patterns…
Extended data Figure 6. Heatmap showing patterns of cell type-specific enrichments for histone mark H3K4me3 in breast cancer overall, ER+ and ER- breast cancer as well as 16 other traits.
Extended data Figure 7. Heatmap showing patterns…
Extended data Figure 7. Heatmap showing patterns of cell type-specific enrichments for histone mark H3K9ac in breast cancer overall, ER-positive and ER-negative breast cancer as well as 16 other traits.
Extended data Figure 8. Functional assessment of…
Extended data Figure 8. Functional assessment of regulatory variants at 1p36, 11p15 and 1p34 risk loci.
a, The KLHDC7A or b, PIDD1 promoter regions containing the reference (prom-Ref) or risk alleles (prom-Hap), were cloned upstream of the pGL3 luciferase reporter gene. MCF7 or Bre-80 cells were transfected with constructs and assayed for luciferase activity after 24 h. Error bars denote 95% CI (n=3). P-values were determined by two-way ANOVA followed by Dunnett’s multiple comparisons test (*P<0.05, **P<0.01, ***P<0.001). c, 3C assays. A physical map of the region interrogated by 3C is shown first. Grey boxes depict the putative regulatory elements (PREs), blue vertical lines indicate the risk-associated SNPs and black dotted line represents chromatin looping. The graphs represent three independent 3C interaction profiles. 3C libraries were generated with EcoRI, grey vertical boxes indicate the interacting restriction fragment (containing PRE1 and PRE2). Error bars denote SD. d, PRE1 or PRE2 containing the reference (PRE-ref) or risk (PRE-Hap) haplotypes were cloned downstream of a CITED4 promoter-driven luciferase construct (CITED4 prom). MCF7 or Bre-80 cells were transfected with constructs and assayed for luciferase activity after 24 h. Error bars denote 95% CI (n=3). P-values were determined by two-way ANOVA followed by Dunnett’s multiple comparisons test (**P<0.01, ***P<0.001).
Extended data Figure 9. Functional assessment of…
Extended data Figure 9. Functional assessment of regulatory variants at the 7q22 risk locus.
a-e, 3C assays. A physical map of the region interrogated by 3C is shown first. Grey horizontal boxes depict the putative regulatory elements (PREs), blue vertical lines indicate the risk-associated SNPs and black dotted line represents chromatin looping. The graphs represent three independent 3C interaction profiles between the a,CUX1, b, d,PRKRIP1 or c, e,RASA4 promoter regions and PREs. 3C libraries were generated with EcoRI, grey vertical boxes indicate the interacting restriction fragment (containing PRE1 and/or PRE2). Error bars denote SD. f, g, Allele-specific 3C. 3C followed by Sanger sequencing for the f, PRKRIP1-PRE2 or g, RASA4-PRE1 or -PRE2 in heterozygous MDA-MB-231 breast cancer cells.
Figure 1
Figure 1
(a) Manhattan plot showing log10P-values for SNP associations with overall breast cancer (b) Manhattan plot after excluding previously identified associated regions. The red line denotes “genome-wide” significance (P<5x10-8); the blue line denotes P<10-5.

References

    1. Amos CI, et al. The OncoArray Consortium: a Network for Understanding the Genetic Architecture of Common Cancers. Cancer Epidemiol Biomarkers Prev. 2016 doi: 10.1158/1055-9965.EPI-16-0106.
    1. Michailidou K, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nature genetics. 2013;45:353–361. doi: 10.1038/ng.2563.
    1. Long J, et al. Genome-wide association study in east Asians identifies novel susceptibility loci for breast cancer. PLoS Genet. 2012;8:e1002532. doi: 10.1371/journal.pgen.1002532.
    1. Cai Q, et al. Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1. Nature genetics. 2014;46:886–890. doi: 10.1038/ng.3041.
    1. Long J, et al. A common deletion in the APOBEC3 genes and breast cancer risk. J Natl Cancer Inst. 2013;105:573–579. doi: 10.1093/jnci/djt018.
    1. He C, et al. Genome-wide association studies identify loci associated with age at menarche and age at natural menopause. Nature genetics. 2009;41:724–728. doi: 10.1038/ng.385.
    1. Kawase T, et al. PH domain-only protein PHLDA3 is a p53-regulated repressor of Akt. Cell. 2009;136:535–550. doi: 10.1016/j.cell.2008.12.002.
    1. Nik-Zainal S, et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature. 2016;534:47–54. doi: 10.1038/nature17676.
    1. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. doi: 10.1038/nature11412.
    1. Ciriello G, et al. Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer. Cell. 2015;163:506–519. doi: 10.1016/j.cell.2015.09.033.
    1. Pereira B, et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat Commun. 2016;7:11479. doi: 10.1038/ncomms11479.
    1. Pers TH, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun. 2015;6:5890. doi: 10.1038/ncomms6890.
    1. Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PloS one. 2010;5:e13984. doi: 10.1371/journal.pone.0013984.
    1. Turner N, Grose R. Fibroblast growth factor signalling: from development to cancer. Nat Rev Cancer. 2010;10:116–129. doi: 10.1038/nrc2780.
    1. Heldin CH. Targeting the PDGF signaling pathway in tumor treatment. Cell Commun Signal. 2013;11:97. doi: 10.1186/1478-811X-11-97.
    1. Howe LR, Brow AM. Wnt signaling and breast cancer. Cancer Biol Ther. 2004;3:36–41.
    1. Finucane HK, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics. 2015;47:1228–1235. doi: 10.1038/ng.3404.
    1. Lin Y, Ma W, Benchimol S. Pidd, a new death-domain-containing protein, is induced by p53 and promotes apoptosis. Nature genetics. 2000;26:122–127. doi: 10.1038/79102.
    1. Fox SB, et al. CITED4 inhibits hypoxia-activated transcription in cancer cells, and its cytoplasmic location in breast cancer is associated with elevated expression of tumor cell hypoxia-inducible factor 1alpha. Cancer research. 2004;64:6075–6081. doi: 10.1158/0008-5472.CAN-04-0708.
    1. Michailidou K, et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nature genetics. 2015;47:373–380. doi: 10.1038/ng.3242.
    1. O'Connell J, et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10:e1004234. doi: 10.1371/journal.pgen.1004234.
    1. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529.
    1. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature genetics. 2012;44:955–959. doi: 10.1038/ng.2354.
    1. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010;34:816–834. doi: 10.1002/gepi.20533.
    1. Aulchenko YS, Struchalin MV, van Duijn CM. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics. 2010;11:134. doi: 10.1186/1471-2105-11-134.
    1. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340.
    1. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature genetics. 2006;38:209–213. doi: 10.1038/ng1706.
    1. Team, R. C. R: A Language and Environment for Statistical Computing. 2016 < >.
    1. Consortium, E. P. A user's guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046.
    1. Roadmap Epigenomics, C et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248.
    1. Udler MS, Tyrer J, Easton DF. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet Epidemiol. 2010;34:463–468. doi: 10.1002/gepi.20504.
    1. Wellcome Trust Case Control, C et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nature genetics. 2012;44:1294–1301. doi: 10.1038/ng.2435.
    1. Curtis C, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486:346–352. doi: 10.1038/nature10983.
    1. Baran Y, et al. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics. 2012;28:1359–1367. doi: 10.1093/bioinformatics/bts144.
    1. Genomes Project, C et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632.
    1. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323.
    1. Mermel CH, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12:R41. doi: 10.1186/gb-2011-12-4-r41.
    1. Li Q, et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell. 2013;152:633–641. doi: 10.1016/j.cell.2012.12.034.
    1. Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163.
    1. Karolchik D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–496. doi: 10.1093/nar/gkh103.
    1. Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021.
    1. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033.
    1. Harrow J, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–1774. doi: 10.1101/gr.135350.111.
    1. Corradin O, et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 2014;24:1–13. doi: 10.1101/gr.164079.113.
    1. He B, Chen C, Teng L, Tan K. Global view of enhancer-promoter interactome in human cells. Proc Natl Acad Sci U S A. 2014;111:E2191–2199. doi: 10.1073/pnas.1320308111.
    1. Andersson R, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–461. doi: 10.1038/nature12787.
    1. Hnisz D, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053.
    1. Joly Beauparlant C, et al. metagene Profiles Analyses Reveal Regulatory Element's Factor-Specific Recruitment Patterns. PLoS Comput Biol. 2016;12:e1004751. doi: 10.1371/journal.pcbi.1004751.
    1. Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nature genetics. 2014;46:310–315. doi: 10.1038/ng.2892.
    1. Shihab HA, et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Human mutation. 2013;34:57–65. doi: 10.1002/humu.22225.
    1. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–1561. doi: 10.1101/gr.092619.109.
    1. Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011;39:e118. doi: 10.1093/nar/gkr407.
    1. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11:361–362. doi: 10.1038/nmeth.2890.
    1. Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248.
    1. Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PloS one. 2012;7:e46688. doi: 10.1371/journal.pone.0046688.
    1. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1081. doi: 10.1038/nprot.2009.86.
    1. Desmet FO, et al. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 2009;37:e67. doi: 10.1093/nar/gkp215.
    1. Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004;11:377–394. doi: 10.1089/1066527041410418.
    1. Welter D, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–1006. doi: 10.1093/nar/gkt1229.
    1. Ghoussaini M, et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat Commun. 2014;4:4999. doi: 10.1038/ncomms5999.
    1. Joshi-Tope G, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005;33:D428–432. doi: 10.1093/nar/gki072.
    1. Schaefer CF, et al. PID: the Pathway Interaction Database. Nucleic Acids Res. 2009;37:D674–679. doi: 10.1093/nar/gkn653.
    1. Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556.
    1. Romero P, et al. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2005;6:R2. doi: 10.1186/gb-2004-6-1-r2.
    1. Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102.
    1. Kandasamy K, et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 2010;11:R3. doi: 10.1186/gb-2010-11-1-r3.
    1. Thomas PD, et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003;13:2129–2141. doi: 10.1101/gr.772403.
    1. Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics. 2011;98:1–8. doi: 10.1016/j.ygeno.2011.04.006.
    1. Wang K, Li M, Hakonarson H. Analysing biological pathways in genome-wide association studies. Nat Rev Genet. 2010;11:843–854. doi: 10.1038/nrg2884.
    1. Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet. 2007;81:1278–1283. doi: 10.1086/522374.
    1. Mogushi K, Tanaka H. PathAct: a novel method for pathway analysis using gene expression profiles. Bioinformation. 2013;9:394–400. doi: 10.6026/97320630009394.
    1. Medina I, et al. Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res. 2009;37:W340–344. doi: 10.1093/nar/gkp481.
    1. Lee YH, Kim JH, Song GG. Genome-wide pathway analysis of breast cancer. Tumour Biol. 2014;35:7699–7705. doi: 10.1007/s13277-014-2027-5.
    1. Jia P, Zheng S, Long J, Zheng W, Zhao Z. dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics. 2011;27:95–102. doi: 10.1093/bioinformatics/btq615.
    1. Braun R, Buetow K. Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet. 2011;7:e1002101. doi: 10.1371/journal.pgen.1002101.
    1. Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303.

Source: PubMed

3
订阅