Microbiomic signatures of psoriasis: feasibility and methodology comparison

Alexander Statnikov, Alexander V Alekseyenko, Zhiguo Li, Mikael Henaff, Guillermo I Perez-Perez, Martin J Blaser, Constantin F Aliferis, Alexander Statnikov, Alexander V Alekseyenko, Zhiguo Li, Mikael Henaff, Guillermo I Perez-Perez, Martin J Blaser, Constantin F Aliferis

Abstract

Psoriasis is a common chronic inflammatory disease of the skin. We sought to use bacterial community abundance data to assess the feasibility of developing multivariate molecular signatures for differentiation of cutaneous psoriatic lesions, clinically unaffected contralateral skin from psoriatic patients, and similar cutaneous loci in matched healthy control subjects. Using 16S rRNA high-throughput DNA sequencing, we assayed the cutaneous microbiome for 51 such matched specimen triplets including subjects of both genders, different age groups, ethnicities and multiple body sites. None of the subjects had recently received relevant treatments or antibiotics. We found that molecular signatures for the diagnosis of psoriasis result in significant accuracy ranging from 0.75 to 0.89 AUC, depending on the classification task. We also found a significant effect of DNA sequencing and downstream analysis protocols on the accuracy of molecular signatures. Our results demonstrate that it is feasible to develop accurate molecular signatures for the diagnosis of psoriasis from microbiomic data.

Figures

Figure 1. Classification accuracy (AUC) versus number…
Figure 1. Classification accuracy (AUC) versus number of selected features/taxa for 37 feature selection methods averaged over 4 classification tasks (PN vs. CC, PL vs. CC, PL vs. PN, and CC vs. PL and PN) in data based on the V3–V5 16S rRNA locus. Methods from the same algorithmic family are shown with the same markers in the figure.
The pink area contains methods that have nominally higher classification accuracy (AUC) than GLL. The green area contains methods that have selected fewer taxa than GLL. The red dash-dotted line indicates a Pareto frontier constructed over non-GLL methods. Methods on the Pareto frontier are such that no other non-GLL method has both higher AUC and a smaller number of selected features averaged over the four classification tasks.
Figure 2. Classification accuracy (AUC) versus number…
Figure 2. Classification accuracy (AUC) versus number of selected features/taxa for 37 feature selection methods for each of the four classification tasks (PN vs. CC, PL vs. CC, PL vs. PN, and CC vs. PL and PN) in data based on the V3–V5 16S rRNA gene locus.
Methods from the same algorithmic family are shown with the same markers in the figure. The pink area contains methods that have nominally higher classification accuracy (AUC) than GLL. The green area contains methods that have selected fewer taxa than GLL. The red dash-dotted line indicates a Pareto frontier constructed over non-GLL methods. Methods on the Pareto frontier are such that no other non-GLL method has both higher AUC and a smaller number of selected features for each classification task.

References

    1. Fahlen A., Engstrand L., Baker B. S., Powles A. & Fry L. Comparison of bacterial microbiota in skin biopsies from normal and psoriatic skin. Archives of dermatological research 304, 15–22 (2012).
    1. Gao Z., Tseng C. H., Strober B. E., Pei Z. & Blaser M. J. Substantial alterations of the cutaneous bacterial biota in psoriatic lesions. PLoS ONE 3, e2719 (2008).
    1. Paulino L. C., Tseng C. H., Strober B. E. & Blaser M. J. Molecular analysis of fungal microbiota in samples from healthy human skin and psoriatic lesions. Journal of clinical microbiology 44, 2933–2941 (2006).
    1. Reischl J. et al. Increased expression of Wnt5a in psoriatic plaques. The Journal of investigative dermatology 127, 163–169 (2007).
    1. Yao Y. et al. Type I interferon: potential therapeutic target for psoriasis? PLoS ONE 3, e2737 (2008).
    1. Nair R. P. et al. Genome-wide scan reveals association of psoriasis with IL-23 and NF-kappaB pathways. Nat Genet 41, 199–204 (2009).
    1. Knights D., Costello E. K. & Knight R. Supervised classification of human microbiota. FEMS Microbiol.Rev. 35, 343–359 (2011).
    1. Arumugam M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).
    1. Yatsunenko T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).
    1. Fierer N. et al. Forensic identification using skin bacterial communities. Proceedings of the National Academy of Sciences of the United States of America 107, 6477–6481 (2010).
    1. Gao Z., Tseng C. H., Pei Z. & Blaser M. J. Molecular analysis of human forearm superficial skin bacterial biota. Proceedings of the National Academy of Sciences of the United States of America 104, 2927–2932 (2007).
    1. Grice E. A. & Segre J. A. The skin microbiome. Nature reviews. Microbiology 9, 244–253 (2011).
    1. Costello E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).
    1. Human Microbiome Project, C. A framework for human microbiome research. Nature 486, 215–221 (2012).
    1. Aliferis C. F., Statnikov A., Tsamardinos I., Mani S. & Koutsoukos X. D. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research 11, 171–234 (2010).
    1. Spirtes P., Glymour C. N. & Scheines R. Causation, prediction, and search. Vol. 2nd (MIT Press, 2000).
    1. Narendra V., Lytkin N. I., Aliferis C. F. & Statnikov A. A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. Genomics 97, 7–18 (2011).
    1. Alekseyenko A. V. et al. Causal graph-based analysis of genome-wide association data in rheumatoid arthritis. Biology Direct 6, 25 (2011).
    1. Aliferis C. F., Statnikov A., Tsamardinos I., Mani S. & Koutsoukos X. D. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part II: Analysis and Extensions. Journal of Machine Learning Research 11, 235–284 (2010).
    1. Gao Z., Perez-Perez G. I., Chen Y. & Blaser M. J. Quantitation of major human cutaneous bacterial and fungal populations. Journal of clinical microbiology 48, 3575–3581 (2010).
    1. Cho I. et al. Antibiotics in early life alter the murine colonic microbiome and adiposity. Nature 488, 621–626 (2012).
    1. Wang Q., Garrity G. M., Tiedje J. M. & Cole J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology 73, 5261–5267 (2007).
    1. Statnikov A., Aliferis C. F., Hardin D. P. & Guyon I. A Gentle Introduction to Support Vector Machines in Biomedicine, Volume 2: Case Studies & Benchmarks. (World Scientific Publishing, 2013).
    1. Guyon I., Weston J., Barnhill S. & Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002).
    1. Hollander M. & Wolfe D. Nonparametric statistical methods. Vol. 2nd (Wiley, 1999).
    1. Golub T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).
    1. Statnikov A., Aliferis C. F., Tsamardinos I., Hardin D. & Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005).
    1. Ding C. & Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform.Comput.Biol. 3, 185–205 (2005).
    1. Peng H., Long F. & Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE transactions on pattern analysis and machine intelligence 27, 1226–1238 (2005).
    1. Diaz-Uriarte R. & Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006).
    1. Zou H. & Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B(Statistical Methodology) 67, 301–320 (2005).
    1. Bicciato S., Luchini A. & Di B. C. PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics 19, 571–578 (2003).
    1. Zou H., Hastie T. & Tibshirani R. Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 265–286 (2006).
    1. Ma S. & Huang J. Clustering threshold gradient descent regularization: with applications to microarray studies. Bioinformatics 23, 466–472 (2007).
    1. Vapnik V. N. Statistical learning theory. (Wiley, 1998).
    1. Statnikov A. et al. A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1, 11 (2013).
    1. Burges C. J. C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998).
    1. Cristianini N. & Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. (Cambridge University Press, 2000).
    1. Fan R. E., Chen P. H. & Lin C. J. Working set selection using second order information for training support vector machines. Journal of Machine Learning Research 6, 1918 (2005).
    1. Braga-Neto U. M. & Dougherty E. R. Is cross-validation valid for small-sample microarray classification? Bioinformatics 20, 374–380 (2004).
    1. Statnikov A., Tsamardinos I., Dosbayev Y. & Aliferis C. F. GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int.J.Med.Inform. 74, 491–503 (2005).
    1. Fawcett T. ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report, HPL-2003-4, HP Laboratories (2003).
    1. Harrell F. E. Jr, Lee K. L. & Mark D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat.Med 15, 361–387 (1996).
    1. Ling C. X., Huang J. & Zhang H. AUC: a statistically consistent and more discriminating measure than accuracy. Proceedings of the Eighteenth International Joint Conference of Artificial Intelligence (IJCAI) (2003).
    1. Good P. I. Permutation tests: a practical guide to resampling methods for testing hypotheses. Vol. 2nd (Springer, 2000).
    1. Aliferis C. F. et al. Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data. PLoS ONE 4, e4922 (2009).
    1. Menke J. & Martinez T. R. Using permutations instead of student's t distribution for p-values in paired-difference algorithm comparisons. Proceedings of 2004 IEEE International Joint Conference on Neural Networks 2, 1331–1335 (2004).
    1. Benjamini Y. & Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society.Series B (Methodological) 57, 289–300 (1995).
    1. Benjamini Y. & Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann.Statist 29, 1165–1188 (2001).

Source: PubMed

3
Se inscrever