Genome-wide cell-free DNA fragmentation in patients with cancer

Stephen Cristiano, Alessandro Leal, Jillian Phallen, Jacob Fiksel, Vilmos Adleff, Daniel C Bruhm, Sarah Østrup Jensen, Jamie E Medina, Carolyn Hruban, James R White, Doreen N Palsgrove, Noushin Niknafs, Valsamo Anagnostou, Patrick Forde, Jarushka Naidoo, Kristen Marrone, Julie Brahmer, Brian D Woodward, Hatim Husain, Karlijn L van Rooijen, Mai-Britt Worm Ørntoft, Anders Husted Madsen, Cornelis J H van de Velde, Marcel Verheij, Annemieke Cats, Cornelis J A Punt, Geraldine R Vink, Nicole C T van Grieken, Miriam Koopman, Remond J A Fijneman, Julia S Johansen, Hans Jørgen Nielsen, Gerrit A Meijer, Claus Lindbjerg Andersen, Robert B Scharpf, Victor E Velculescu, Stephen Cristiano, Alessandro Leal, Jillian Phallen, Jacob Fiksel, Vilmos Adleff, Daniel C Bruhm, Sarah Østrup Jensen, Jamie E Medina, Carolyn Hruban, James R White, Doreen N Palsgrove, Noushin Niknafs, Valsamo Anagnostou, Patrick Forde, Jarushka Naidoo, Kristen Marrone, Julie Brahmer, Brian D Woodward, Hatim Husain, Karlijn L van Rooijen, Mai-Britt Worm Ørntoft, Anders Husted Madsen, Cornelis J H van de Velde, Marcel Verheij, Annemieke Cats, Cornelis J A Punt, Geraldine R Vink, Nicole C T van Grieken, Miriam Koopman, Remond J A Fijneman, Julia S Johansen, Hans Jørgen Nielsen, Gerrit A Meijer, Claus Lindbjerg Andersen, Robert B Scharpf, Victor E Velculescu

Abstract

Cell-free DNA in the blood provides a non-invasive diagnostic avenue for patients with cancer1. However, characteristics of the origins and molecular features of cell-free DNA are poorly understood. Here we developed an approach to evaluate fragmentation patterns of cell-free DNA across the genome, and found that profiles of healthy individuals reflected nucleosomal patterns of white blood cells, whereas patients with cancer had altered fragmentation profiles. We used this method to analyse the fragmentation profiles of 236 patients with breast, colorectal, lung, ovarian, pancreatic, gastric or bile duct cancer and 245 healthy individuals. A machine learning model that incorporated genome-wide fragmentation features had sensitivities of detection ranging from 57% to more than 99% among the seven cancer types at 98% specificity, with an overall area under the curve value of 0.94. Fragmentation profiles could be used to identify the tissue of origin of the cancers to a limited number of sites in 75% of cases. Combining our approach with mutation-based cell-free DNA analyses detected 91% of patients with cancer. The results of these analyses highlight important properties of cell-free DNA and provide a proof-of-principle approach for the screening, early detection and monitoring of human cancer.

Conflict of interest statement

Competing interests: S.C., A.L., J.P., J.F., V. Adleff, R.B.S., and V.E.V.. are inventors on patent applications (62/673,516 and 62/795,900) submitted by Johns Hopkins University related to cell-free DNA for cancer detection. V.E.V. is a founder of Delfi Diagnostics and Personal Genome Diagnostics, a member of their Scientific Advisory Boards and Boards of Directors, and owns Delfi Diagnostics and Personal Genome Diagnostics stock, which are subject to certain restrictions under university policy. Within the last five years, V.E.V. has been an advisor to Daiichi Sankyo, Janssen Diagnostics, Ignyta, and Takeda Pharmaceuticals. The terms of these arrangements are managed by Johns Hopkins University in accordance with its conflict of interest policies.

Figures

Extended Data Fig. 1.. Simulations of noninvasive…
Extended Data Fig. 1.. Simulations of noninvasive cancer detection based on number of alterations analyzed and tumor-derived cfDNA fragment distributions.
a, Monte Carlo simulations were performed using different numbers of tumor-specific alterations to evaluate the probability of detecting cancer alterations in cfDNA at the indicated fraction of tumor-derived molecules. The simulations were performed assuming an average of 2000 genome equivalents of cfDNA and the requirement of five or more observations of any alteration. These analyses indicate that increasing the number of tumor-specific alterations improves the sensitivity of detection of circulating tumor DNA. b, Cumulative density functions of cfDNA fragment lengths of 42 loci containing tumor-specific alterations from 30 patients with breast, colorectal, lung, or ovarian cancer are shown with 95% confidence bands (orange). Lengths of mutant cfDNA fragments were significantly different in size compared to wild-type cfDNA fragments (blue) at these loci. c, GC content was similar for mutated and non-mutated fragments. d, GC content was not correlated to fragment length.
Extended Data Fig. 2.. Germline and hematopoietic…
Extended Data Fig. 2.. Germline and hematopoietic cfDNA fragment distributions.
a, Cumulative density functions of fragment lengths at 44 loci containing germline alterations (non-tumor derived) from 38 patients with breast, colorectal, lung, or ovarian cancer are shown with 95% confidence bands. Fragments with germline mutations (orange) were comparable in length to wild-type cfDNA fragment lengths (blue). b, Cumulative density functions of fragment lengths at 41 loci containing hematopoietic alterations (non-tumor derived) from 28 patients with breast, colorectal, lung, or ovarian cancer are shown with 95% confidence bands. After correction for multiple testing, there were no significant differences (α=0.05) in the size distributions of mutated hematopoietic cfDNA fragments (orange) and wild-type cfDNA fragments (blue).
Extended Data Fig. 3.. cfDNA fragmentation in…
Extended Data Fig. 3.. cfDNA fragmentation in healthy individuals and patients with lung cancer.
a, cfDNA fragments lengths are shown for healthy individuals (n=30, gray) and patients with lung cancer (n=8, blue). b-d, cfDNA fragmentation profiles from healthy individuals (n=30) had high correlations while patients with lung cancer (n=8) had lower correlations to median fragmentation profiles of b, lymphocytes, c, lymphocyte nucleosome distances, and, d, healthy cfDNA. Pearson correlations are shown with box plots depicting minimum, 25th percentile, median, 75th percentile, and maximum values. e, High coverage (9x) whole-genome sequencing data were subsampled to 2x, 1x, 0.5x, 0.2x, and 0.1x fold coverage. Mean centered genome-wide fragmentation profiles in 5 Mb bins for 30 healthy individuals and 8 patients with lung cancer are depicted for each subsampled fold coverage with median profiles shown in blue. f, Pearson Correlation of subsampled profiles to initial profile at 9x coverage for healthy individuals and patients with lung cancer.
Extended Data Fig. 4.. cfDNA fragmentation profiles…
Extended Data Fig. 4.. cfDNA fragmentation profiles and sequence alterations during therapy.
Detection and monitoring of cancer in serial blood draws from NSCLC patients (n=19) undergoing treatment with targeted tyrosine kinase inhibitors (black arrows) was performed using targeted sequencing (top) as previously reported and genome-wide fragmentation profiles (bottom). For each case, the vertical axis of the lower panel displays −1 times the Pearson correlation of each sample to the median healthy cfDNA fragmentation profile. Error bars depict confidence intervals from binomial tests for mutant allele fractions and confidence intervals calculated using Fisher transformation for genome-wide fragmentation profiles. Although the approaches analyze different aspects of cfDNA (whole genome compared to specific alterations) the targeted sequencing and fragmentation profiles were similar for patients responding to therapy as well as those with stable or progressive disease. As fragmentation profiles reflect both genomic and epigenomic alterations, while mutant allele fractions only reflect individual mutations, mutant allele fractions alone may not reflect the absolute level of correlation of fragmentation profiles to healthy individuals.
Extended Data Fig. 5.. Profiles of cfDNA…
Extended Data Fig. 5.. Profiles of cfDNA fragment lengths in copy neutral regions in healthy individuals and one patient with colorectal cancer.
a, The fragmentation profile in 211 copy neutral windows in chromosomes 1–6 for 25 randomly selected healthy individuals (gray). For a patient with colorectal cancer (CGCRC291) with an estimated mutant allele fraction of 20%, we diluted the cancer fragment length profile to an approximate 10% tumor contribution (blue). a, b While the marginal densities of the fragment profiles for the healthy samples and cancer patient show substantial overlap (a, right), the fragmentation profiles are different as can be seen visualization of the fragmentation profiles (a, left) and by the separation of the colorectal cancer patient from the healthy samples (n=25) in a principal component analysis (b).
Extended Data Fig. 6.. Genome-wide GC correction…
Extended Data Fig. 6.. Genome-wide GC correction of cfDNA fragments.
To estimate and control for the effects of GC content on sequencing coverage, we calculated coverage in non-overlapping 100kb genomic windows across the autosomes. For each window, we calculated the average GC of the aligned fragments. a, Loess smoothing of raw coverage (top row) for two randomly selected healthy subjects (CGPLH189 and CGPLH380) and two cancer patients (CGPLLU161 and CGPLBR24) with undetectable aneuploidy (PA score < 2.35). After subtracting the average coverage predicted by the loess model, the residuals were rescaled to the median autosomal coverage (bottom row). As fragment length may also result in coverage biases, we performed this GC correction procedure separately for short (≤ 150 bp) and long (> 150 bp) fragments. While the 100 kb bins on chromosome 19 (blue points) consistently have less coverage than predicted by the loess model, we did not implement a chromosome-specific correction as such an approach would remove the effects of chromosomal copy number on coverage. b, Overall, we found a limited correlation between short or long fragment coverage and GC content after correction among healthy subjects (n=211, inter-quartile range: −0.03–0.03) and cancer patients (n=128, inter-quartile range: −0.06–0.02) with a PA score <3. Box plots depict minimum, 25th percentile, median, 75th percentile, and maximum values.
Extended Data Fig. 7.. Machine learning model.
Extended Data Fig. 7.. Machine learning model.
a, We used gradient tree boosting machine learning to examine whether cfDNA can be categorized as having characteristics of a cancer patient or healthy individual. The machine learning model included fragmentation size and coverage characteristics in windows throughout the genome, as well as chromosomal arm and mitochondrial DNA copy numbers. We employed a 10-fold cross-validation approach in which each sample is randomly assigned to a fold and 9 of the folds (90% of the data) are used for training and one fold (10% of the data) is used for testing. The prediction accuracy from a single cross-validation is an average over the 10 possible combinations of test and training sets. As this prediction accuracy can reflect bias from the initial randomization of patients, we repeat the entire procedure, including the randomization of patients to folds, 10 times. For all cases, feature selection and model estimation were performed on training data and were validated on test data and the test data were never used for feature selection. Ultimately, we obtained a DELFI score that could be used to classify individuals as likely healthy or having cancer. b, Distribution of AUCs across the repeated 10-fold cross-validation. The 25th, 50th, and 75th percentiles of the 100 AUCs for the cohort of 215 healthy individuals and 208 patients with cancer are indicated by dashed lines.
Extended Data Fig. 8.. Whole-genome analyses of…
Extended Data Fig. 8.. Whole-genome analyses of chromosomal arm copy number changes and mitochondrial genome representation.
a, Z scores for each autosome arm are depicted for healthy individuals (n=215) and patients with cancer (n=208). The vertical axis depicts normal copy at zero with positive and negative values indicating arm gains and losses, respectively. Z scores greater than 50 or less than −50 are thresholded at the indicated values. b, The fraction of reads mapping to the mitochondrial genome is depicted for healthy individuals (n=215) and patients with cancer (n=208). Box plots depict the minimum, 25th percentile, median, 75th percentile, and maximum values.
Extended Data Fig. 9.. DELFI detection of…
Extended Data Fig. 9.. DELFI detection of cancer and tissue of origin prediction.
a, Analyses of individual cancer types using the DELFI-combined approach had AUCs ranging from 0.86 to >0.99. b, Receiver operator characteristics for detection of cancer using cfDNA fragmentation profiles and other genome-wide features in a machine learning approach are depicted for a cohort of 215 healthy individuals and each stage of 208 patients with cancer with ≥ 95% specificity shaded in blue. c, Receiver operator characteristics for DELFI tissue prediction of bile duct, breast, colorectal, gastric, lung, ovarian, or pancreatic cancer are depicted. In order to increase sample sizes within cancer type classes, we included cases detected with a 90% specificity, and the lung cancer cohort was supplemented with the addition of baseline cfDNA data from 18 lung cancer patients with prior treatment. d, DELFI tissue of origin prediction.
Extended Data Fig. 10.. Detection of cancer…
Extended Data Fig. 10.. Detection of cancer using DELFI and mutation-based cfDNA approaches.
DELFI (green) and targeted sequencing for mutation identification (blue) were performed independently in a cohort of 126 patients with breast, bile duct, colorectal, gastric, lung, or ovarian cancer. The number of individuals detected by each approach and in combination are indicated for DELFI detection with a specificity of 98%, targeted sequencing specificity at >99%, and a combined specificity of 98%. ND indicates not detected.
Fig. 1.. Schematic of DELFI approach.
Fig. 1.. Schematic of DELFI approach.
Blood is collected from healthy individuals and patients with cancer. cfDNA is extracted from plasma, processed into sequencing libraries, examined through WGS, mapped to the genome, and analyzed to determine cfDNA fragmentation profiles across the genome. Machine learning is used to categorize whether individuals have cancer and identify tumor tissue of origin.
Fig. 2.. Aberrant cfDNA fragmentation profiles in…
Fig. 2.. Aberrant cfDNA fragmentation profiles in patients with cancer.
a, Genome-wide cfDNA fragmentation profiles (defined as the ratio of short to long fragments) from ~9x WGS are shown in 5 Mb bins for 30 healthy individuals (top) and 8 lung cancer patients (bottom). b, Analyses of healthy cfDNA (top), lung cancer cfDNA (middle), and healthy lymphocyte (bottom) fragmentation profiles from chromosome 1 at 1 Mb resolution. Healthy lymphocyte profiles were scaled with a standard deviation equal to that of the median healthy cfDNA profiles. c, Smoothed median distances between adjacent nucleosome centered at zero using 100 kb bins from healthy cfDNA (top) and nuclease-digested healthy lymphocytes (middle) are depicted together with the first eigenvector for the genome contact matrix from Hi-C analyses of lymphoblastoid cells (bottom).
Fig. 3.. cfDNA fragmentation profiles in healthy…
Fig. 3.. cfDNA fragmentation profiles in healthy individuals and patients with cancer.
a, Fragmentation profiles (bottom) in the context of tumor copy number changes (top) in a colorectal cancer patient. The distribution of segment means and integer copy numbers are shown at top right. b, GC adjusted fragmentation profiles from 1–2x WGS for healthy individuals and patients with cancer are depicted per cancer type using 5 Mb windows. The median healthy profile is indicated in black and the 98% confidence band is shown in gray. For patients with cancer, individual profiles are colored based on their Pearson correlation to the healthy median. c, Windows are indicated in orange if more than 10% of the cancer samples had a fragment ratio more than three standard deviations from the median healthy fragment ratio.
Fig. 4.. Detection of cancer using DELFI.
Fig. 4.. Detection of cancer using DELFI.
Receiver operator characteristics for detection of cancer using cfDNA fragmentation profiles and other genome-wide features in a machine learning approach are depicted for a cohort of 215 healthy individuals and 208 patients with cancer (DELFI, AUC = 0.94), with ≥ 95% specificity shaded in blue. Machine learning analyses of chromosomal arm copy number (Chr copy number (ML)), and mitochondrial genome copy number analyses (mtDNA), are shown in the indicated colors.

References

    1. Wan JCM et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer 17, 223–238, doi:10.1038/nrc.2017.7 (2017).
    1. Bray F et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68, 394–424, doi:10.3322/caac.21492 (2018).
    1. World Health Organization. Guide to Cancer Early Diagnosis. Guide to Cancer Early Diagnosis (2017).
    1. National Comprehensive Cancer Network (NCCN) clinical practice guidelines in oncology. Accessed 16 April 2019.
    1. Phallen J et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci Transl Med 9, doi:10.1126/scitranslmed.aan2415 (2017).
    1. Cohen JD et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930, doi:10.1126/science.aar3247 (2018).
    1. Newman AM et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med 20, 548–554, doi:10.1038/nm.3519 (2014).
    1. Bettegowda C et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med 6, 224ra224, doi:10.1126/scitranslmed.3007094 (2014).
    1. Leary RJ et al. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2, 20ra14, doi:2/20/20ra14 [pii] 10.1126/scitranslmed.3000702 [doi] (2010).
    1. Leary RJ et al. Detection of chromosomal alterations in the circulation of cancer patients with whole-genome sequencing. Sci Transl Med 4, 162ra154, doi:10.1126/scitranslmed.3004742 (2012).
    1. Chan KC et al. Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc Natl Acad Sci U S A 110, 18761–18768, doi:10.1073/pnas.1313995110 (2013).
    1. Jiang P et al. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc Natl Acad Sci U S A 112, E1317–1325, doi:10.1073/pnas.1500076112 (2015).
    1. Wang BG et al. Increased plasma DNA integrity in cancer patients. Cancer Res 63, 3966–3968 (2003).
    1. Umetani N et al. Prediction of breast tumor progression by integrity of free circulating DNA in serum. J Clin Oncol 24, 4270–4276, doi:10.1200/JCO.2006.05.9493 (2006).
    1. Chan KC, Leung SF, Yeung SW, Chan AT & Lo YM Persistent aberrations in circulating DNA integrity after radiotherapy are associated with poor prognosis in nasopharyngeal carcinoma patients. Clin Cancer Res 14, 4141–4145, doi:10.1158/1078-0432.CCR-08-0182 (2008).
    1. Mouliere F et al. High fragmentation characterizes tumour-derived circulating DNA. PLoS One 6, e23418, doi:10.1371/journal.pone.0023418 (2011).
    1. Mouliere F et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci Transl Med 10, doi:10.1126/scitranslmed.aat4921 (2018).
    1. Snyder MW, Kircher M, Hill AJ, Daza RM & Shendure J Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell 164, 57–68, doi:10.1016/j.cell.2015.11.050 (2016).
    1. Underhill HR et al. Fragment Length of Circulating Tumor DNA. PLoS Genet 12, e1006162, doi:10.1371/journal.pgen.1006162 (2016).
    1. Ulz P et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat Genet 48, 1273–1278, doi:10.1038/ng.3648 (2016).
    1. Ivanov M, Baranova A, Butler T, Spellman P & Mileyko V Non-random fragmentation patterns in circulating cell-free DNA reflect epigenetic regulation. BMC Genomics 16 Suppl 13, S1, doi:10.1186/1471-2164-16-S13-S1 (2015).
    1. Jiang P et al. Preferred end coordinates and somatic variants as signatures of circulating tumor DNA associated with hepatocellular carcinoma. Proc Natl Acad Sci U S A, doi:10.1073/pnas.1814616115 (2018).
    1. Shen SY et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature, doi:10.1038/s41586-018-0703-0 (2018).
    1. Corces MR et al. The chromatin accessibility landscape of primary human cancers. Science 362, doi:10.1126/science.aav1898 (2018).
    1. Polak P et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364, doi:10.1038/nature14221 (2015).
    1. Lieberman-Aiden E et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293, doi:10.1126/science.1181369 (2009).
    1. Fortin JP & Hansen KD Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol 16, 180, doi:10.1186/s13059-015-0741-y (2015).
    1. Diehl F et al. Circulating mutant DNA to assess tumor dynamics. Nat Med 14, 985–990 (2008).
    1. Phallen J et al. Early noninvasive detection of response to targeted therapy in non-small cell lung cancer. Cancer Research 15, 1204–1213, doi:DOI: 10.1158/0008-5472.CAN-18-1082 (2019).
    1. Burnham P et al. Single-stranded DNA library preparation uncovers the origin and diversity of ultrashort cell-free DNA in plasma. Scientific reports 6, 27859, doi:10.1038/srep27859 (2016).
    1. Sanchez C, Snyder MW, Tanos R, Shendure J & Thierry AR New insights into structural features and optimal detection of circulating tumor DNA determined by single-strand DNA analysis. NPJ genomic medicine 3, 31, doi:10.1038/s41525-018-0069-0 (2018).
    1. Fisher S et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol 12, R1, doi:10.1186/gb-2011-12-1-r1 (2011).
    1. Jones S et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci Transl Med 7, 283ra253, doi:10.1126/scitranslmed.aaa7161 (2015).
    1. Benjamini Y & Speed TP Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40, e72, doi:10.1093/nar/gks001 (2012).
    1. Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359, doi:10.1038/nmeth.1923 (2012).
    1. Friedman JH Greedy function approximation: A gradient boosting machine. Ann Stat 29, 1189–1232, doi:DOI 10.1214/aos/1013203451 (2001).
    1. Friedman JH Stochastic gradient boosting. Comput Stat Data An 38, 367–378, doi:Doi 10.1016/S0167-9473(01)00065-2 (2002).
    1. Efron B & Tibshirani R Improvements on cross-validation: The .632+ bootstrap method. J Am Stat Assoc 92, 548–560, doi:Doi 10.2307/2965703 (1997).
    1. Zurbenko IG The spectral analysis of time series. (Elsevier, 1986).
    1. Robin X et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics 12, 77, doi:10.1186/1471-2105-12-77 (2011).

Source: PubMed

3
Subscribe