Multi-dimensional fragmentomic assay for ultrasensitive early detection of colorectal advanced adenoma and adenocarcinoma

Xiaoji Ma, Yikuan Chen, Wanxiangfu Tang, Hua Bao, Shaobo Mo, Rui Liu, Shuyu Wu, Hairong Bao, Yaqi Li, Long Zhang, Xue Wu, Sanjun Cai, Yang Shao, Fangqi Liu, Junjie Peng, Xiaoji Ma, Yikuan Chen, Wanxiangfu Tang, Hua Bao, Shaobo Mo, Rui Liu, Shuyu Wu, Hairong Bao, Yaqi Li, Long Zhang, Xue Wu, Sanjun Cai, Yang Shao, Fangqi Liu, Junjie Peng

Abstract

Previous studies on liquid biopsy-based early detection of advanced colorectal adenoma (advCRA) or adenocarcinoma (CRC) were limited by low sensitivity. We performed a prospective study to establish an integrated model using fragmentomic profiles of plasma cell-free DNA (cfDNA) for accurately and cost-effectively detecting early-stage CRC and advCRA. The training cohort enrolled 310 participants, including 149 early-stage CRC patients, 46 advCRA patients and 115 healthy controls. Plasma cfDNA samples were prepared for whole-genome sequencing. An ensemble stacked model differentiating healthy controls from advCRA/early-stage CRC patients was trained using five machine learning models and five cfDNA fragmentomic features based on the training cohort. The model was subsequently validated using an independent test cohort (N = 311; including 149 early-stage CRC, 46 advCRA and 116 healthy controls). Our model showed an area under the curve (AUC) of 0.988 for differentiating advCRA/early-stage CRC patients from healthy individuals in an independent test cohort. The model performed even better for identifying early-stage CRC (AUC 0.990) compared to advCRA (AUC 0.982). At 94.8% specificity, the sensitivities for detecting advCRA and early-stage CRC reached 95.7% and 98.0% (0: 94.1%; I: 98.5%), respectively. Promisingly, the detection sensitivity has reached 100% and 97.6% in early-stage CRC patients with negative fecal occult or CEA blood test results, respectively. Finally, our model maintained promising performances (AUC: 0.982, 94.4% sensitivity at 94.8% specificity) even when sequencing depth was down-sampled to 1X. Our integrated predictive model demonstrated an unprecedented detection sensitivity for advCRA and early-stage CRC, shedding light on more accurate noninvasive CRC screening in clinical practice.

Conflict of interest statement

Wanxiangfu Tang, Hua Bao, Rui Liu, Shuyu Wu, Hairong Bao, Xue Wu and Yang Shao are employees of Nanjing Geneseeq Technology Inc., Nanjing, Jiangsu, China. The remaining authors have nothing to declare.

© 2021. The Author(s).

Figures

Fig. 1
Fig. 1
Schematic illustration of study design. Plasma samples were collected from patients with advanced colorectal adenoma (advCRA) or early-stage (stage 0/I) adenocarcinoma (CRC), as well as healthy controls. The cfDNA was then extracted from the participant’s plasma sample and subject to whole-genome sequencing. Five different feature types, including Fragment Size Ratio (FSR), Fragment Size Distribution (FSD), EnD Motif (EDM), BreakPoint Motif (BPM) and Copy Number Variation (CNV), were calculated using mapped sequencing reads. For each feature type, a base model was constructed based on the ensemble learning of five algorithm, GLM, GBM, random forest, deep learning and Xgboost. The base model predictions were then ensembled into a large matrix, which was subsequently used by a GLM algorithm to train the final ensemble stack model
Fig. 2
Fig. 2
Evaluation of ensemble stacked machine learning model. A Graphical representation of datasets composition. The training cohort (N = 310) included 149 early-stage CRC patients, 46 advCRA patients and 115 healthy controls and was used to train the stacked ensemble model. The test cohort (N = 311), which included 149 early-stage CRC patients, 46 advCRA patients and 115 healthy controls, was independently used to evaluate model performances. B ROC curves evaluating the overall performance of the predictive model, which was constructed using 4 X coverage WGS data, in distinguishing advCRA/early-stage CRC patients from healthy controls in the test cohort. C Table evaluating model performances in the test dataset. D Boxplots illustrating cancer score distribution in the healthy, advCRA and early-stage CRC groups in the test cohort based on the 4 X overage model. The 95% specificity cutoff for cancer score was 0.62 as shown by the dotted line

References

    1. Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, Jensen SO, Medina JE, Hruban C, White JR, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385–389. doi: 10.1038/s41586-019-1272-6.
    1. Jiang P, Sun K, Peng W, Cheng SH, Ni M, Yeung PC, Heung MMS, Xie T, Shang H, Zhou Z, et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 2020;10(5):664–673.
    1. Rasmussen SL, Krarup HB, Sunesen KG, Johansen MB, Stender MT, Pedersen IS, Madsen PH, Thorlacius-Ussing O. Hypermethylated DNA, a circulating biomarker for colorectal cancer detection. PLoS ONE. 2017;12(7):e0180809. doi: 10.1371/journal.pone.0180809.
    1. Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, Wang W, Sheng H, Pu H, Mo H, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524):7533. doi: 10.1126/scitranslmed.aax7533.
    1. Jin S, Zhu D, Shao F, Chen S, Guo Y, Li K, Wang Y, Ding R, Gao L, Ma W, et al. Efficient detection and post-surgical monitoring of colon cancer with a multi-marker DNA methylation liquid biopsy. Proc Natl Acad Sci USA. 2021;118(5):985–989. doi: 10.1073/pnas.2017421118.
    1. Wan N, Weinberg D, Liu TY, Niehaus K, Ariazi EA, Delubac D, Kannan A, White B, Bailey M, Bertin M, et al. Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA. BMC Cancer. 2019;19(1):832. doi: 10.1186/s12885-019-6003-8.
    1. Zhang C, Ma Y. Ensemble machine learning: methods and applications. New York: Springer; 2012.
    1. Kwon H, Park J, Lee Y. Stacking ensemble technique for classifying breast cancer. Healthc Inform Res. 2019;25(4):283–288. doi: 10.4258/hir.2019.25.4.283.

Source: PubMed

3
Předplatit