The Effect of Different Case Definitions of Current Smoking on the Discovery of Smoking-Related Blood Gene Expression Signatures in Chronic Obstructive Pulmonary Disease

Ma'en Obeidat, Xiaoting Ding, Nick Fishbane, Zsuzsanna Hollander, Raymond T Ng, Bruce McManus, Scott J Tebbutt, Bruce E Miller, Stephen Rennard, Peter D Paré, Don D Sin, Ma'en Obeidat, Xiaoting Ding, Nick Fishbane, Zsuzsanna Hollander, Raymond T Ng, Bruce McManus, Scott J Tebbutt, Bruce E Miller, Stephen Rennard, Peter D Paré, Don D Sin

Abstract

Introduction: Smoking is the number one modifiable environmental risk factor for chronic obstructive pulmonary disease (COPD). Clinical, epidemiological and increasingly "omics" studies assess or adjust for current smoking status using only self-report, which may be inaccurate. Objective measures such as exhaled carbon monoxide (eCO) may also be problematic owing to limitations in the measurements and the relatively short half life of the molecule. In this study, we determined the impact of different case definitions of current cigarette smoking on gene expression in peripheral blood of patients with COPD.

Methods: Peripheral blood gene expression from 573 former- and current-smokers with COPD in the ECLIPSE study was used to find genes whose expression was associated with smoking status. Current smoking was defined using self-report, eCO concentrations, or both. Linear regression was used to determine the association of current smoking status with gene expression adjusting for age, sex and propensity score. Pathway enrichment analyses were performed on genes with P < .001.

Result: Using self-report or eCO, only two genes were differentially expressed between current and ex-smokers, with no enrichment in biological processes. When current smoking was defined using both eCO and self-report, four genes were differentially expressed (LRRN3, PID1, FUCA1, GPR15) with enrichment in 40 biological pathways related to metabolic processes, response to hypoxia and hormonal stimulus. Additionally, the combined definition provided better distributions of test statistics for differential gene expression.

Conclusion: A combined phenotype of eCO and self report allows for better discovery of genes and pathways related to current smoking.

Implications: Studies relying only on self report of smoking status to assess or adjust for the impact of smoking may not fully capture its effect and will lead to residual confounding of results.

Trial registration: ClinicalTrials.gov NCT00292552.

© The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Figures

Figure 1.
Figure 1.
Boxplot of exhaled carbon monoxide (eCO) by self-reported status. eCO is shown in ppm on the Y axis. Self-reported smoking status is shown on the X axis. The horizontal line at 8.1 ppm represents the cut-off point applied to remove subjects discordant for the two measures of smoking status. Self reported smokers with a eCO 8.1. Fifty-four and 29 subjects fell into these categories, respectively.
Figure 2.
Figure 2.
Volcano Plots of differential gene expression using three case definitions. The plot shows the log2 fold difference in gene expression on the X axis versus the unadjusted P values (on the –log10 scale) on the Y axis. For the self-reported status and the combined phenotype, the blue and red dots represent genes that showed fold change in either direction greater than 0.2 and have unadjusted P value < .01. Genes that had a False Discovery Rate (FDR) adjusted P values less than .1 for differential expression are annotated on the graph. (A) Self-reported smoking status; (B) exhaled carbon monoxide (eCO); (C) Combined phenotype. The eCO plot (B) shows gene expression differences per unit increase in eCO concentration, hence the X scale is different to A and C.
Figure 3.
Figure 3.
A quantile–quantile (QQ) plot for the three case definitions of smoking status. The X axis is −log10 of the expected P-values, and the Y axis is −log10 of the actual P-values in QQ plot. Under the null hypothesis, the points should fall approximately along the 45-degree reference line. Genes with low P values deviate from the reference line, indicating significant association.

Source: PubMed

3
Subscribe