A diVIsive Shuffling Approach (VIStA) for gene expression analysis to identify subtypes in Chronic Obstructive Pulmonary Disease

Jörg Menche, Amitabh Sharma, Michael H Cho, Ruth J Mayer, Stephen I Rennard, Bartolome Celli, Bruce E Miller, Nick Locantore, Ruth Tal-Singer, Soumitra Ghosh, Chris Larminie, Glyn Bradley, John H Riley, Alvar Agusti, Edwin K Silverman, Albert-László Barabási, Jörg Menche, Amitabh Sharma, Michael H Cho, Ruth J Mayer, Stephen I Rennard, Bartolome Celli, Bruce E Miller, Nick Locantore, Ruth Tal-Singer, Soumitra Ghosh, Chris Larminie, Glyn Bradley, John H Riley, Alvar Agusti, Edwin K Silverman, Albert-László Barabási

Abstract

Background: An important step toward understanding the biological mechanisms underlying a complex disease is a refined understanding of its clinical heterogeneity. Relating clinical and molecular differences may allow us to define more specific subtypes of patients that respond differently to therapeutic interventions.

Results: We developed a novel unbiased method called diVIsive Shuffling Approach (VIStA) that identifies subgroups of patients by maximizing the difference in their gene expression patterns. We tested our algorithm on 140 subjects with Chronic Obstructive Pulmonary Disease (COPD) and found four distinct, biologically and clinically meaningful combinations of clinical characteristics that are associated with large gene expression differences. The dominant characteristic in these combinations was the severity of airflow limitation. Other frequently identified measures included emphysema, fibrinogen levels, phlegm, BMI and age. A pathway analysis of the differentially expressed genes in the identified subtypes suggests that VIStA is capable of capturing specific molecular signatures within in each group.

Conclusions: The introduced methodology allowed us to identify combinations of clinical characteristics that correspond to clear gene expression differences. The resulting subtypes for COPD contribute to a better understanding of its heterogeneity.

Trial registration: ClinicalTrials.gov NCT00292552.

Figures

Figure 1
Figure 1
Schematic representation of the diVIsive Shuffling Approach (VIStA). A Initially the subjects are divided randomly into three groups; gene expression differences are calculated between group 1 & 2, the third group serves as a reservoir for the subsequent shuffling steps. At each shuffling step, a subject from group 1 or 2 is randomly exchanged with a subject from the reservoir. If the number of differentially expressed genes increases thereby, the swap is accepted, otherwise rejected. B 20 exemplary time series of the number of differentially expressed genes between group 1 & 2 as a function of the number of attempted shuffles. The different curves correspond to different random initial divisions. After approximately 1000 shuffles the groups converge and present a large, stationary number of differentially expressed genes. C For each of the obtained divisions (500 in total), clinical characteristics in group 1 & 2 are compared.
Figure 2
Figure 2
Combination of clinical characteristics associated with groups from VIStA. A Number of times the characteristics were found significantly different between group 1 & 2 in a total of 500 divisions. Severity of airflow limitation (GOLDCD) is the single most important determinant of differential gene expression, being statistically significant in 95% of all VIStA outputs. B Summary of the individual and pairwise number of significant occurrences of the clinical characteristics. Node size is proportional to the number of times a measure was found significant, the width of a link indicates how often two measures appeared significant in the same VIStA division. The core group contains severity of airflow limitation (GOLDCD) and the two emphysema measures EMPHETCD and FV950. C, Number of times that pairwise combinations of clinical characteristics co-occurred in the 500 VIStA outcomes. The most significant pair (as compared to a Null model of independent occurrence) is EMPHETCD and FV950, which are both measures of emphysema. D The most frequent and significant triplet is a combination of GOLDCD and EMPHETCD and FV950, measuring disease severity. E We find significant combinations of the disease severity triplet in B with four clinical characteristics: BMI, PHLEGM, DWALK and AGE.
Figure 3
Figure 3
Four subtypes and differentially expressed genes. A The combinations of phenotypic measures that define the subtypes predicted by the VIStA method: all four subtypes share a common core of high values of GOLDCD, FV950 and EMPHETCD, reflecting disease severity. Each of the individual subtypes I-IV presents one additional clinical characteristic: BMI (subtype I), DWALK (II), AGE (III) or PHLEGM (IV). B Venn diagram showing the number of differentially expressed genes unique to each subtype, as well as common to all four subtypes. The common genes show a large overlap with the genes differentially expressed between subjects with GOLDCD 2 and subjects with GOLDCD 3&4, indicating that these genes reflect mostly disease severity.

References

    1. Vestbo J, Hurd SS, Agusti AG, Jones PW, Vogelmeier C, Anzueto A, Barnes PJ, Fabbri LM, Martinez FJ, Nishimura M, Stockley RA, Sin DD, Rodriguez-Roisin R. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: Gold executive summary. American journal of respiratory and critical care medicine. 2013;187(4):347–65. doi: 10.1164/rccm.201204-0596PP.
    1. Bhattacharya S, Srisuma S, Demeo DL, Shapiro SD, Bueno R, Silverman EK, Reilly JJ, Mariani TJ. Molecular biomarkers for quantitative and discrete copd phenotypes. American journal of respiratory cell and molecular biology. 2009;40(3):359–67. doi: 10.1165/rcmb.2008-0114OC.
    1. Singh D, Fox SM, Tal-Singer R, Plumb J, Bates S, Broad P, Riley JH, Celli B. Induced sputum genes associated with spirometric and radiological disease severity in copd ex-smokers. Thorax. 2011;66(6):489–95. doi: 10.1136/thx.2010.153767.
    1. Pierrou S, Broberg P, O'Donnell RA, Pawlowski K, Virtala R, Lindqvist E, Richter A, Wilson SJ, Angco G, Moller S, Bergstrand H, Koopmann W, Wieslander E, Stromstedt PE, Holgate ST, Davies DE, Lund J, Djukanovic R. Expression of genes involved in oxidative stress responses in airway epithelial cells of smokers with chronic obstructive pulmonary disease. American journal of respiratory and critical care medicine. 2007;175(6):577–86. doi: 10.1164/rccm.200607-931OC.
    1. DeMeo D, Mariani T, Lange C, Lake S, Litonjua A, Celedon J, Reilly J, Chapman HA, Sparrow D, Spira A, Beane J, Pinto-Plata V, Speizer FE, Shapiro S, Weiss ST, Silverman EK. The serpine2 gene is associated with chronic obstructive pulmonary disease. Proceedings of the American Thoracic Society. 2006;3(6):502. doi: 10.1513/pats.200603-070MS.
    1. Spira A, Beane J, Pinto-Plata V, Kadar A, Liu G, Shah V, Celli B, Brody JS. Gene expression profiling of human lung tissue from smokers with severe emphysema. American journal of respiratory cell and molecular biology. 2004;31(6):601–10. doi: 10.1165/rcmb.2004-0273OC.
    1. Agusti A, Calverley PM, Celli B, Coxson HO, Edwards LD, Lomas DA, MacNee W, Miller BE, Rennard S, Silverman EK, Tal-Singer R, Wouters E, Yates JC, Vestbo J. Characterisation of copd heterogeneity in the eclipse cohort. Respiratory research. 2010;11:122.
    1. Agusti A, Sobradillo P, Celli B. Addressing the complexity of chronic obstructive pulmonary disease: from phenotypes and biomarkers to scale-free networks, systems biology, and p4 medicine. American journal of respiratory and critical care medicine. 2011;183(9):1129–37. doi: 10.1164/rccm.201009-1414PP.
    1. Vestbo J, Anderson W, Coxson HO, Crim C, Dawber F, Edwards L, Hagan G, Knobil K, Lomas DA, MacNee W, Silverman EK, Tal-Singer R. Evaluation of copd longitudinally to identify predictive surrogate end-points (eclipse) The European respiratory journal: official journal of the European Society for Clinical Respiratory Physiology. 2008;31(4):869–73. doi: 10.1183/09031936.00111707.
    1. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nature reviews Genetics. 2011;12(1):56–68. doi: 10.1038/nrg2918.
    1. Coxson HO, Dirksen A, Edwards LD, Yates JC, Agusti A, Bakke P, Calverley PM, Celli B, Crim C, Duvoix A, Fauerbach PN, Lomas DA, MacNee W, Mayer RJ, Miller BE, Müller NL, Rennard SI, Silverman EK, Tal-Singer R, Wouters EF, Vestbo J. The presence and progression of emphysema in copd as determined by ct scanning and biomarker expression: a prospective analysis from the eclipse study. The Lancet Respiratory Medicine. 2013;1(2):129–136. doi: 10.1016/S2213-2600(13)70006-7.
    1. Agusti A, Edwards LD, Rennard SI, MacNee W, Tal-Singer R, Miller BE, Vestbo J, Lomas DA, Calverley PM, Wouters E, Crim C, Yates JC, Silverman EK, Coxson HO, Bakke P, Mayer RJ, Celli B. Persistent systemic inflammation is associated with poor clinical outcomes in COPD: a novel phenotype. PLoS One. 2012;7(5):37483. doi: 10.1371/journal.pone.0037483.
    1. Larsson O, Wahlestedt C, Timmons JA. Considerations when using the significance analysis of microarrays (sam) algorithm. BMC bioinformatics. 2005;6:129. doi: 10.1186/1471-2105-6-129.
    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES. et al.Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102.
    1. Luo J, Chen YJ, Narsavage GL, Ducatman A. Predictors of survival in patients with non-small cell lung cancer. Oncology nursing forum. 2012;39(6):609–16. doi: 10.1188/12.ONF.609-616.
    1. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, Trent JM. Use of a cdna microarray to analyse gene expression patterns in human cancer. Nature genetics. 1996;14(4):457–60.
    1. Wellmann A, Thieblemont C, Pittaluga S, Sakai A, Jaffe ES, Siebert P, Raffeld M. Detection of differentially expressed genes in lymphomas using cdna arrays: identification of clusterin as a new diagnostic marker for anaplastic large-cell lymphomas. Blood. 2000;96(2):398–404.
    1. Maquoi E, Munaut C, Colige A, Collen D, Lijnen HR. Modulation of adipose tissue expression of murine matrix metalloproteinases and their tissue inhibitors with obesity. Diabetes. 2002;51(4):1093–101. doi: 10.2337/diabetes.51.4.1093.

Source: PubMed

3
Subscribe