Sensitive detection of stage I lung adenocarcinoma using plasma cell-free DNA breakpoint motif profiling

Wei Guo, Xin Chen, Rui Liu, Naixin Liang, Qianli Ma, Hua Bao, Xiuxiu Xu, Xue Wu, Shanshan Yang, Yang Shao, Fengwei Tan, Qi Xue, Shugeng Gao, Jie He, Wei Guo, Xin Chen, Rui Liu, Naixin Liang, Qianli Ma, Hua Bao, Xiuxiu Xu, Xue Wu, Shanshan Yang, Yang Shao, Fengwei Tan, Qi Xue, Shugeng Gao, Jie He

Abstract

Background: Early diagnosis benefits lung cancer patients with higher survival, but most patients are diagnosed after metastasis. Although cell-free DNA (cfDNA) analysis holds promise, its sensitivity for detecting early-stage lung cancer is unsatisfying. We leveraged cfDNA fragmentomics to develop a predictive model for invasive stage I lung adenocarcinoma (LUAD).

Methods: 292 stage I LUAD patients from three medical centers were included together with 230 healthy controls whose plasma cfDNA samples were profiled by whole-genome sequencing (WGS). Multiple cfDNA fragmentomic motif features and machine learning models were compared in the training cohort to select the best model. Model performance was assessed in the internal and external validation cohorts and an additional dataset.

Findings: A logistic regression model using the 6bp-breakpoint-motif feature was selected. It yielded 98·0% sensitivity and 94·7% specificity in the internal validation cohort [Area Under the Curve (AUC): 0·985], while 92·5% sensitivity and 90·0% specificity were achieved in the external validation cohort (AUC: 0·954). It is sensitive for early-stage (100% sensitivity for minimally invasive adenocarcinoma, MIA) and <1 cm (92·9%-97·7% sensitivity) tumors. The predictive power remained high when reducing sequencing depth to 0·5× (AUC: 0·977 and 0·931 for internal and external cohorts).

Interpretation: Here we have established a cfDNA breakpoint motif-based model for detecting early-stage LUAD, including MIA and very small-size tumors, shedding light on early cancer diagnosis in clinical practice.

Funding: National Key R&D Program of China; National Natural Science Foundation of China; CAMS Initiative for Innovative Medicine; Special Research Fund for Central Universities, Peking Union Medical College; Non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences; Beijing Hope Run Special Fund of Cancer Foundation of China.

Keywords: Cell-free DNA; Early detection; Fragmentomics; Lung cancer; Whole-genome sequencing.

Conflict of interest statement

Declaration of interests XC, RL, HB, XX, XW, SY, and YS are employees of Nanjing Geneseeq Technology Inc., China. All other authors have declared no conflicts of interest.

Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.

Figures

Figure 1
Figure 1
Schematic illustration of the study. (a) Study design. A total of 522 participants (cancer 292, healthy 230) were included in this study. Whole-genome sequencing of plasma cfDNA was performed, and their cfDNA breakpoint motif was profiled. 265 participants (cancer 150, healthy 115) were allocated to training for building the logistic regression algorithm-based machine learning model. 177 participants (cancer 102, healthy 75) were allocated to internal validation for confirming the model performance and determining the cutoff score. 80 participants (cancer 40, healthy 40) were allocated to external validation for evaluating model performance. (b) Schematic diagram of cancer probability score determination. Plasma cfDNA was extracted from the participant's plasma sample and subject to whole-genome sequencing. The sequencing reads that were mapped to a human reference genome were used to determine the 6-nucleotide sequence (i.e., a 6bp breakpoint motif) on each 5′ fragment end (Watson and Crick strands) of plasma cfDNA relative to the genome. The genome-wide breakpoint motif profile was then applied in the logistic regression algorithm to calculate the participant's cancer probability score.
Figure 2
Figure 2
Identification and evaluation of the 6bp breakpoint motif logistic regression model for cancer prediction. (a) ROC curve evaluating the performance of predictive models built on different cfDNA motif features and machine learning algorithms in distinguishing early lung cancer from healthy subjects for the combined validation cohorts (DL: deep learning; XG: XGBoost; RL: logistic regression); (b) the sensitivities and specificities of different predictive models from (a) in the combined validation cohorts; (c) Heat map analyzing frequencies of the 65 breakpoint motifs in the training model with non-zero coefficients between healthy and cancer subjects in the validation cohorts. The data are row-normalized; (d) Box plot showing frequencies between healthy and cancer subjects in the validation cohorts for the four representative motifs contributing most significantly to the model (*: 0·01 < p < 0·05; **: 0·001 < p < 0·01; ***: p < 0·001, Wilcoxon rank-sum test).
Figure 3
Figure 3
Development and evaluation of the predictive model in internal and external validation cohorts. (a) ROC curve evaluating the overall performance of the predictive model all using 5× coverage WGS data in distinguishing early lung cancer from healthy subjects for the internal and external validation cohorts; (b) The boxplot showing the distribution of cancer scores based on the 5× WGS model in the patient and control groups of the validation cohorts. The 95% specificity cutoff score for the internal validation set is 0·3725 (*: 0·01 < p < 0·05; **: 0·001 < p < 0·01; ***: p < 0·001, t-test); (c) and (d) ROC curves evaluating the 5× WGS-based model performance using low-coverage (4×-0·5×) WGS data in internal and external validation cohorts.
Figure 4
Figure 4
The model's diagnostic sensitivities in different subgroups of the combined validation cohorts at 95% specificity. The sensitivities (%) were calculated with 95% confidence interval as indicated by the bars for subgroups of (a) histology, (b) stage, (c) differentiation level, (d) tumor size, (e) age, (f) sex, (g) smoking, (h) drinking, (i) predominant histologic pattern, (j) tumor location and (k) focality. The numbers in the parentheses represent the true positive and total cases in each subgroup category.

References

    1. Sorber L, Zwaenepoel K, Deschoolmeester V, et al. Circulating cell-free nucleic acids and platelets as a liquid biopsy in the provision of personalized therapy for lung cancer patients. Lung Cancer. 2017;107:100–107.
    1. Key statistics for lung cancer. American Cancer Society; 2021. . Accessed 9 March 2021.
    1. International Early Lung Cancer Action Program I. Henschke CI, Yankelevitz DF, et al. Survival of patients with stage I lung cancer detected on CT screening. N Engl J Med. 2006;355(17):1763–1771.
    1. Goebel C, Louden CL, McKenna R, Jr., Onugha O, Wachtel A, Long T. Diagnosis of non-small cell lung cancer for early stage asymptomatic patients. Cancer Genom Proteom. 2019;16(4):229–244.
    1. Lung cancer survival rates. American Cancer Society; 2021. . Accessed 9 March 2021.
    1. Blandin Knight S, Crosbie PA, Balata H, Chudziak J, Hussell T, Dive C. Progress and prospects of early detection in lung cancer. Open Biol. 2017;7(9):170070.
    1. Can lung cancer be found early? American Cancer Society; 2021. . Accessed 9 March 2021.
    1. National Lung Screening Trial Research T. Church TR, Black WC, Aberle DR, et al. Results of initial low-dose computed tomographic screening for lung cancer. N Engl J Med. 2013;368(21):1980–1991.
    1. Stroun M, Maurice P, Vasioukhin V, et al. The origin and mechanism of circulating DNA. Ann NY Acad Sci. 2000;906:161–168.
    1. Sun K, Jiang P, Chan KC, et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc Natl Acad Sci U S A. 2015;112(40):E5503–E5512.
    1. Fece de la Cruz F, Corcoran RB. Methylation in cell-free DNA for early cancer detection. Ann Oncol. 2018;29(6):1351–1353.
    1. Benesova L, Belsanova B, Suchanek S, et al. Mutation-based detection and monitoring of cell-free tumor DNA in peripheral blood of cancer patients. Anal Biochem. 2013;433(2):227–234.
    1. Chabon JJ, Hamilton EG, Kurtz DM, et al. Integrating genomic features for noninvasive early lung cancer detection. Nature. 2020;580(7802):245–251.
    1. Lo YMD, Han DSC, Jiang P, Chiu RWK. Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science. 2021;372(6538):eaaw3616.
    1. Cristiano S, Leal A, Phallen J, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385–389.
    1. Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV. Consortium C. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31(6):745–759.
    1. Mathios D, Johansen JS, Cristiano S, et al. Early detection of lung cancer using cfDNA fragmentation. J Clin Oncol. 2021;39(15_suppl):8519.
    1. Jiang P, Sun K, Tong YK, et al. Preferred end coordinates and somatic variants as signatures of circulating tumor DNA associated with hepatocellular carcinoma. Proc Nat Acad Sci USA. 2018;115(46):E10925–E10E33.
    1. Jiang P, Sun K, Peng W, et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 2020;10(5):664–673.
    1. Han DSC, Ni M, Chan RWY, et al. The biology of cell-free DNA fragmentation and the roles of DNASE1, DNASE1L3, and DFFB. Am J Hum Genet. 2020;106(2):202–214.
    1. Corces MR, Granja JM, Shams S, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362(6413):eaav1898.
    1. Wan N, Weinberg D, Liu TY, et al. Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA. BMC Cancer. 2019;19(1):832.
    1. Mathios D, Johansen JS, Cristiano S, et al. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat Commun. 2021;12(1):5060.
    1. Shen SY, Singhania R, Fehringer G, et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature. 2018;563(7732):579.
    1. Liang WH, Zhao Y, Huang WZ, et al. Noninvasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA (ctDNA) Theranostics. 2019;9(7):2056–2070.
    1. Liang N, Li B, Jia Z, et al. Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning. Nat Biomed Eng. 2021;5(6):586–599.
    1. Liang W, Zhao Y, Huang W, et al. Noninvasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA (ctDNA) Theranostics. 2019;9(7):2056–2070.
    1. Chan KC, Jiang P, Sun K, et al. Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends. Proc Natl Acad Sci U S A. 2016;113(50):E8159–E8E68.
    1. Chen L, Abou-Alfa GK, et al. Genome-scale profiling of circulating cell-free DNA signatures for early detection of hepatocellular carcinoma in cirrhotic patients. Cell Res. 2021;31:589–592.
    1. Chen K, Sun J, Zhao H, et al. Noninvasive lung cancer diagnosis and prognosis based on multi-analyte liquid biopsy. Mol Cancer. 2021;20(1):23.
    1. Tailor TD, Rao X, Campa MJ, Wang J, Gregory SG, Patz EF., Jr. Whole exome sequencing of cell-free DNA for early lung cancer: a pilot study to differentiate benign from malignant CT-detected pulmonary lesions. Front Oncol. 2019;9:317.
    1. Klein EA, Richards D, Cohn A, et al. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Ann Oncol. 2021;32(9):1167–1177.
    1. Force USPST. Krist AH, Davidson KW, et al. Screening for lung cancer: US preventive services task force recommendation statement. JAMA. 2021;325(10):962–970.
    1. Jonas DE, Reuland DS, Reddy SM, et al. Screening for lung cancer with low-dose computed tomography: updated evidence report and systematic review for the US preventive services task force. JAMA. 2021;325(10):971–987.

Source: PubMed

Подписаться