Detection of lung cancer using weighted digital analysis of breath biomarkers

Michael Phillips, Nasser Altorki, John H M Austin, Robert B Cameron, Renee N Cataneo, Robert Kloss, Roger A Maxfield, Muhammad I Munawar, Harvey I Pass, Asif Rashid, William N Rom, Peter Schmitt, James Wai, Michael Phillips, Nasser Altorki, John H M Austin, Robert B Cameron, Renee N Cataneo, Robert Kloss, Roger A Maxfield, Muhammad I Munawar, Harvey I Pass, Asif Rashid, William N Rom, Peter Schmitt, James Wai

Abstract

Background: A combination of biomarkers in a multivariate model may predict disease with greater accuracy than a single biomarker employed alone. We developed a non-linear method of multivariate analysis, weighted digital analysis (WDA), and evaluated its ability to predict lung cancer employing volatile biomarkers in the breath.

Methods: WDA generates a discriminant function to predict membership in disease vs no disease groups by determining weight, a cutoff value, and a sign for each predictor variable employed in the model. The weight of each predictor variable was the area under the curve (AUC) of the receiver operating characteristic (ROC) curve minus a fixed offset of 0.55, where the AUC was obtained by employing that predictor variable alone, as the sole marker of disease. The sign (+/-) was used to invert the predictor variable if a lower value indicated a higher probability of disease. When employed to predict the presence of a disease in a particular patient, the discriminant function was determined as the sum of the weights of all predictor variables that exceeded their cutoff values. The algorithm that generates the discriminant function is deterministic because parameters are calculated from each individual predictor variable without any optimization or adjustment. We employed WDA to re-evaluate data from a recent study of breath biomarkers of lung cancer, comprising the volatile organic compounds (VOCs) in the alveolar breath of 193 subjects with primary lung cancer and 211 controls with a negative chest CT.

Results: The WDA discriminant function accurately identified patients with lung cancer in a model employing 30 breath VOCs (ROC curve AUC=0.90; sensitivity=84.5%, specificity=81.0%). These results were superior to multilinear regression analysis of the same data set (AUC=0.74, sensitivity=68.4, specificity=73.5%). WDA test accuracy did not vary appreciably with TNM (tumor, node, metastasis) stage of disease, and results were not affected by tobacco smoking (ROC curve AUC=0.92 in current smokers, 0.90 in former smokers). WDA was a robust predictor of lung cancer: random removal of 1/3 of the VOCs did not reduce the AUC of the ROC curve by >10% (99.7% CI).

Conclusions: A test employing WDA of breath VOCs predicted lung cancer with accuracy similar to chest computed tomography. The algorithm identified dependencies that were not apparent with traditional linear methods. WDA appears to provide a useful new technique for non-linear multivariate analysis of data.

Figures

Figure 1. Accuracy of a single breath…
Figure 1. Accuracy of a single breath VOC employed as a biomarker of lung cancer
Upper panel: distribution of alveolar gradients of isopropyl alcohol in lung cancer patients and in controls. The sensitivity and the specificity of this VOC as a biomarker of lung cancer varies with the cutoff value at different points along the x-axis. Lower panel: the receiver operating characteristic (ROC) curve derived from the sensitivity and the specificity observed at different cutoff values along the x-axis. For isopropyl alcohol in breath, the AUC of the ROC curve was 0.68, indicating that it was a modestly accurate biomarker of lung cancer when employed alone.
Figure 1. Accuracy of a single breath…
Figure 1. Accuracy of a single breath VOC employed as a biomarker of lung cancer
Upper panel: distribution of alveolar gradients of isopropyl alcohol in lung cancer patients and in controls. The sensitivity and the specificity of this VOC as a biomarker of lung cancer varies with the cutoff value at different points along the x-axis. Lower panel: the receiver operating characteristic (ROC) curve derived from the sensitivity and the specificity observed at different cutoff values along the x-axis. For isopropyl alcohol in breath, the AUC of the ROC curve was 0.68, indicating that it was a modestly accurate biomarker of lung cancer when employed alone.
Figure 2. Effect of random patient assignment…
Figure 2. Effect of random patient assignment on predictive accuracy1
1All breath VOCs were evaluated as biomarkers of lung cancer employing the method shown in Figure 1. The AUC of each ROC curve is displayed employing the correct cancer/control assignment (y-axis) vs random assignment to the cancer or the control group (x-axis). This figure demonstrates the difference between the 2 distributions: when the diagnosis was randomly assigned, no VOC ROC curve had an individual AUC of ≥0.6. However, when the diagnosis was correctly assigned, 69 VOCs had a ROC curve AUC > 0.6, and these VOCs were selected as the best biomarkers of lung cancer.
Figure 3. WDA discriminatory function scores in…
Figure 3. WDA discriminatory function scores in lung cancer and controls1
Upper panel: This histogram displays the distribution of discriminatory function scores in the 2 groups. Lower panel: Mean discriminatory function scores in controls and patients with lung cancer stratified by TNM stage of disease.TNM staging information was available for 166/193 patients with lung cancer. Mean discriminatory function scores were 2.36 (SD=0.47)) in all stages of lung cancer, and 1.30 (SD=0.64)) in controls (p−4, 2 tailed t-test).
Figure 4. Breath biomarkers of lung cancer…
Figure 4. Breath biomarkers of lung cancer stratified by TNM stage
These figures display the ROC curves obtained by stratifying the WDA data in Figure 4 according to the TNM stage of lung cancer. The AUC was high in TNM1 lung cancer, and a similar performance was maintained at all other stages.1 1Since the overall AUC of the total set is high (around 0.9) it is to be expected that the AUC of any of the subsets stratified by TNM stage will have a similarly high value.
Figure 5. Breath biomarkers of lung cancer…
Figure 5. Breath biomarkers of lung cancer stratified by tobacco smoking
These figures display the ROC curves obtained by stratifying the WDA data in Figure 4 according to whether subjects were current smokers (upper panel; AUC = 0.92 or former smokers (lower panel; AUC=0.90), demonstrating that the WDA discriminatory function scores were not skewed by current smoking or a history of smoking.
Figure 6. Effect of the number of…
Figure 6. Effect of the number of VOCs on model performance, and cross validation in random split subsets
Upper panel: The accuracy of the breath test varied with the number of VOCs employed in the model. VOC biomarkers of lung cancer were added to the model one by one, commencing with the highest weight VOC. This figure demonstrates that the breath test identified lung cancer with near maximal accuracy with only 10 VOCs. Lower panel: Mean ROC curves of breath test results employing the same 30 VOCs in 20 random split data sets into a training set and a test set in a 2:1 ratio. The cutoff points, signs, and weight, were adjusted for each split based on the results in their respective training sets.
Figure 7. Robustness of the WDA model…
Figure 7. Robustness of the WDA model and effect of random diagnosis assignment
Upper panel: This figure displays “robustness” vs the number of VOCs included in the analysis. Robustness is defined by the number of VOCs that can be removed on average without degrading the AUC of the ROC curve by >10%. The value is derived by removing randomly selected VOCs from the analysis until the AUC drops by 10%. The line “Robustness – 3 Sigma” indicates the number of VOCs that can be lost so that with 99.7 % probability the AUC will not degrade by >10%. When the WDA analysis included 30 VOCs (arrow), the value of “Robustness – 3 Sigma” was approximately 10. This indicates that a third of the VOCs could be lost from the model without reducing its accuracy by >10% at the 99.7% confidence level. In this context “lost” means that the VOCs were not present in a patient's breath sample or in the room air. Lower panel: This figure displays the effect of the fraction of patients randomized on the AUC of the ROC curve. Assignment of patients to the cancer or the control group was randomized prior to determination of the WDA discriminatory function scores. The accuracy of the WDA model progressively deteriorated with the addition of random classifiers: the AUC of the ROC curve degraded approximately 4% for every 10% of random classifier changes. This supports the conclusion that the undegraded WDA model extracted a lung cancer signal from breath VOCs, because the accuracy of detection fell with the declining integrity of the signal.
Figure 8. Hypothetical basis of the breath…
Figure 8. Hypothetical basis of the breath test for lung cancer
Lung cancer may result from the interaction of hereditary and environmental factors. A person's genotype may include a variety of cytochrome p450 mixed oxidases, some of which are activated by exposure to environmental toxins such as tobacco smoke. A combination of induced enzymes may place a person at increased risk of developing lung cancer by converting precursors to carcinogens. Normal human breath contains a large number of VOCs that are endogenous and exogenous in origin, and an altered pattern of cytochrome p450 mixed oxidase activity could potentially modulate catabolism of these VOCs, thereby generating an abnormal pattern of breath VOCs.

Source: PubMed

3
購読する