Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations

Juncheng Dai, Jun Lv, Meng Zhu, Yuzhuo Wang, Na Qin, Hongxia Ma, Yong-Qiao He, Ruoxin Zhang, Wen Tan, Jingyi Fan, Tianpei Wang, Hong Zheng, Qi Sun, Lijuan Wang, Mingtao Huang, Zijun Ge, Canqing Yu, Yu Guo, Tong-Min Wang, Jie Wang, Lin Xu, Weibing Wu, Liang Chen, Zheng Bian, Robin Walters, Iona Y Millwood, Xi-Zhao Li, Xin Wang, Rayjean J Hung, David C Christiani, Haiquan Chen, Mengyun Wang, Cheng Wang, Yue Jiang, Kexin Chen, Zhengming Chen, Guangfu Jin, Tangchun Wu, Dongxin Lin, Zhibin Hu, Christopher I Amos, Chen Wu, Qingyi Wei, Wei-Hua Jia, Liming Li, Hongbing Shen, Juncheng Dai, Jun Lv, Meng Zhu, Yuzhuo Wang, Na Qin, Hongxia Ma, Yong-Qiao He, Ruoxin Zhang, Wen Tan, Jingyi Fan, Tianpei Wang, Hong Zheng, Qi Sun, Lijuan Wang, Mingtao Huang, Zijun Ge, Canqing Yu, Yu Guo, Tong-Min Wang, Jie Wang, Lin Xu, Weibing Wu, Liang Chen, Zheng Bian, Robin Walters, Iona Y Millwood, Xi-Zhao Li, Xin Wang, Rayjean J Hung, David C Christiani, Haiquan Chen, Mengyun Wang, Cheng Wang, Yue Jiang, Kexin Chen, Zhengming Chen, Guangfu Jin, Tangchun Wu, Dongxin Lin, Zhibin Hu, Christopher I Amos, Chen Wu, Qingyi Wei, Wei-Hua Jia, Liming Li, Hongbing Shen

Abstract

Background: Genetic variation has an important role in the development of non-small-cell lung cancer (NSCLC). However, genetic factors for lung cancer have not been fully identified, especially in Chinese populations, which limits the use of existing polygenic risk scores (PRS) to identify subpopulations at high risk of lung cancer for prevention. We therefore aimed to identify novel loci associated with NSCLC risk, and generate a PRS and evaluate its utility and effectiveness in the prediction of lung cancer risk in Chinese populations.

Methods: To systematically identify genetic variants for NSCLC risk, we newly genotyped 19 546 samples from Chinese NSCLC cases and controls from the Nanjing Medical University Global Screening Array Project and did a meta-analysis of genome-wide association studies (GWASs) of 27 120 individuals with NSCLC and 27 355 without NSCLC (13 327 cases and 13 328 controls of Chinese descent as well as 13 793 cases and 14 027 controls of European descent). We then built a PRS for Chinese populations from all reported single-nucleotide polymorphisms that have been reported to be associated with lung cancer risk at genome-wide significance level. We evaluated the utility and effectiveness of the generated PRS in predicting subpopulations at high-risk of lung cancer in an independent prospective cohort of 95 408 individuals from the China Kadoorie Biobank (CKB) with more than 10 years' follow-up.

Findings: We identified 19 susceptibility loci to be significantly associated with NSCLC risk at p≤5·0 × 10-8, including six novel loci. When applied to the CKB cohort, the PRS of the risk loci successfully predicted lung cancer incident cases in a dose-response manner in participants at a high genetic risk (top 10%) than those at a low genetic risk (bottom 10%; adjusted hazard ratio 1·96, 95% CI 1·53-2·51; ptrend=2·02 × 10-9). Specially, we observed consistently separated curves of lung cancer events in individuals at low, intermediate, and high genetic risk, respectively, and PRS was an independent effective risk stratification indicator beyond age and smoking pack-years.

Interpretation: We have shown for the first time that GWAS-derived PRS can be effectively used in discriminating subpopulations at high risk of lung cancer, who might benefit from a practically feasible PRS-based lung cancer screening programme for precision prevention in Chinese populations.

Funding: National Natural Science Foundation of China, the Priority Academic Program for the Development of Jiangsu Higher Education Institutions, National Key R&D Program of China, Science Foundation for Distinguished Young Scholars of Jiangsu, and China's Thousand Talents Program.

Conflict of interest statement

Declaration of interests

The authors declare no competing interests.

Copyright © 2019 Elsevier Ltd. All rights reserved.

Figures

Figure 1. Manhattan plot showing –log 10…
Figure 1. Manhattan plot showing –log10P values for trans-ancestral fixed-effects meta-analysis of lung cancer risk.
(A) Non-small cell lung cancer. (B) Lung adenocarcinoma. (C) Lung squamous cell carcinoma. Each locus is annotated by its cytogenetic-band location. Black: previously reported loci; red: novel risk loci. The red horizontal line denotes “genome-wide” significance (P -8).
Figure 2. Forest plot of lead variants…
Figure 2. Forest plot of lead variants in lung cancer susceptibility loci.
Effect sizes and allele frequencies in Chinese and Europeans of the effect (risk-increasing) alleles are shown for the lead variants identified for (A) Non-small cell lung cancer. (B) Lung adenocarcinoma. (C) Lung squamous cell carcinoma. Bold text denotes newly-identified loci.
Figure 3. Association of PRS and Standardized…
Figure 3. Association of PRS and Standardized Lung Cancer Events Rates in the CKB prospective cohort.
(A) Participants in the CKB cohort were divided into ten equal parts according to PRS, and the hazard ratio of each part compared with those at lowest tenth were shown in panel A. (B) Participants were defined as high genetic risk, intermediate genetic risk and low genetic risk populations according to the top 5%, 5%-95%, and the bottom 5% of PRS. Panel B shown are the standardized rates of lung cancer events of the three groups in CKB cohort. Hazard ratios and the 95% confidence intervals derived from Cox regression model adjusting for age, sex, resources of region, and smoking status are provided in legend. (C) Cumulative effects of the genetic and smoking were shown in Panel C. The relative risk of participants at different genetic risk in light smokers and heavy smokers were calculated compared to never smokers in the CKB cohort. Hazard ratios and the 95% confidence intervals derived from Cox regression model adjusting for age, sex, and resources of region are provided in legend. (D) Pancel D shown is standardized cumulative lung cancer events in different degrees of smoking according to the three defined genetic risk populations. The I bars represent 95% confidence intervals.

Source: PubMed

3
Abonner