RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types

Gianni Monaco, Bernett Lee, Weili Xu, Seri Mustafah, You Yi Hwang, Christophe Carré, Nicolas Burdin, Lucian Visan, Michele Ceccarelli, Michael Poidinger, Alfred Zippelius, João Pedro de Magalhães, Anis Larbi, Gianni Monaco, Bernett Lee, Weili Xu, Seri Mustafah, You Yi Hwang, Christophe Carré, Nicolas Burdin, Lucian Visan, Michele Ceccarelli, Michael Poidinger, Alfred Zippelius, João Pedro de Magalhães, Anis Larbi

Abstract

The molecular characterization of immune subsets is important for designing effective strategies to understand and treat diseases. We characterized 29 immune cell types within the peripheral blood mononuclear cell (PBMC) fraction of healthy donors using RNA-seq (RNA sequencing) and flow cytometry. Our dataset was used, first, to identify sets of genes that are specific, are co-expressed, and have housekeeping roles across the 29 cell types. Then, we examined differences in mRNA heterogeneity and mRNA abundance revealing cell type specificity. Last, we performed absolute deconvolution on a suitable set of immune cell types using transcriptomics signatures normalized by mRNA abundance. Absolute deconvolution is ready to use for PBMC transcriptomic data using our Shiny app (https://github.com/giannimonaco/ABIS). We benchmarked different deconvolution and normalization methods and validated the resources in independent cohorts. Our work has research, clinical, and diagnostic value by making it possible to effectively associate observations in bulk transcriptomics data to specific immune subsets.

Keywords: RNA-seq; deconvolution; flow cytometry; gene modules; housekeeping; immune system; mRNA abundance; mRNA composition; mRNA heterogeneity; transcriptome.

Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.

Figures

Graphical abstract
Graphical abstract
Figure 1
Figure 1
Representation of the Sample Preparation and Data Collection PBMC aliquots from two cohorts were used for (1) RNA-seq of 29 immune cell types (S4 cohort) and (2) microarray and RNA-seq of PBMCs and immunophenotyping of the 29 immune cell types (S13 cohort). Four staining panels (panels 1–4) were used to sort and immunophenotype the 29 immune cell types (Table S1). Tfh, T follicular helper; Tregs, T regulatory; Th, T helper; CE, central memory; EM, effector memory; TE, terminal effector; MAIT, mucosal-associated invariant T; SM, switched memory; NSM, non-switched memory; Ex, exhausted; LD, low-density; C, classical; I, intermediate; NC, non-classical; mDCs, myeloid dendritic cells; pDCs, plasmocytoid dendritic cells. See also Table S1 for full name and markers information.
Figure 2
Figure 2
Relationship between Immune Cell Types, Determined Using log2 TPM Values (A) t-SNE analysis on the RNA-seq data of the 29 immune cell types and PBMCs. Results are shown in four separate plots to better distinguish the different cell types. Each plot highlights the PBMCs and the cell types of one of the four panels used for FACS. (B) Transcriptomic hematopoietic tree of the 29 immune cell types with progenitor cells fixed as the root of the tree.
Figure 3
Figure 3
Heatmap of DEGs between Each Immune Cell Type and Remaining Samples Modules of genes were found by hierarchical clustering on Euclidean distance. The most biologically relevant GO terms associated with each module are reported on the left. The top differentially expressed genes (DEGs) are reported on the right. See the full list in Table S3.
Figure 4
Figure 4
Comparison of the Gene Expression Profile of the Immune Cell Types from Our Dataset (Columns) with Four External Datasets (Rows) From the samples of each FACS panel in our dataset, we selected the top 1,000 variable genes and calculated the Spearman correlation with samples of external datasets. For the correlation, we used the cell type average of normalized expression values.
Figure 5
Figure 5
Two Aspects of mRNA Composition: Heterogeneity and Abundance (A and B) Heterogeneity. (A) The cumulative sum of the median TPM values of nine relevant cell types calculated from values sorted in decreasing order. The total sum of TPM values is always 106. (B) The minimum number of genes that contribute to 80% of total gene expression in the 29 cell types. This number corresponds to the dashed red line in (A). (C and D) Abundance. (C) mRNA scaling factors for the 29 immune cell types calculated with four methods (STAR Methods). For the clustering distance between rows, we used the Spearman correlation. (D) Pearson correlation matrix for the values reported in (C).
Figure 6
Figure 6
Absolute Deconvolution of RNA-Seq PBMC Samples (A) Exhaustive search for cell types that are suitable for deconvolution from PBMC-derived RNA-seq data. For each cell type, we report the mean and SD of Pearson correlations obtained by deconvolution of all possible combinations of cell types (merged and non-merged) that reconstitute a PBMC sample. Cell types that have been chosen for the deconvolution analysis in (B) are outlined in blue. (B) Comparison of deconvoluted and flow cytometry proportions on 17 immune cell types with respect to PBMCs. The concordance correlation coefficient (ccc) and the Pearson correlation coefficient (r) are shown on each plot.
Figure 7
Figure 7
Benchmarks and Validations of Different Deconvolution and Normalization Methods (A) Comparison of five deconvolution algorithms in the presence and absence of noise and at increasing size of the signature matrix. The total RMSE is calculated by using the estimated and ground-truth proportions of the 17 cell types of RNA-seq deconvolution. (B) Comparison of results obtained from deconvolution methods with and without constraints and using our signature matrix for RNA-seq deconvolution with either TPM values or absolute expression values (ABIS-seq). (C) Comparison of RNA-seq and microarray deconvolution results with different normalization methods. Each dot is a different cell type.

References

    1. Abbas A.R., Baldwin D., Ma Y., Ouyang W., Gurney A., Martin F., Fong S., van Lookeren Campagne M., Godowski P., Williams P.M. Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data. Genes Immun. 2005;6:319–331.
    1. Abbas A.R., Wolslegel K., Seshasayee D., Modrusan Z., Clark H.F. Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS ONE. 2009;4:e6098.
    1. Adlowitz D.G., Barnard J., Biear J.N., Cistrone C., Owen T., Wang W., Palanichamy A., Ezealah E., Campbell D., Wei C. Expansion of activated peripheral blood memory B cells in rheumatoid arthritis, impact of B cell depletion therapy, and biomarkers of response. PLoS ONE. 2015;10:e0128269.
    1. Anders S., Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106.
    1. Andrews S. Babraham Bioinformatics; 2010. FastQC.
    1. Atkuri K.R., Stevens J.C., Neubert H. Mass cytometry: a highly multiplexed single-cell technology for advancing drug development. Drug Metab. Dispos. 2015;43:227–233.
    1. Becht E., Giraldo N.A., Lacroix L., Buttard B., Elarouci N., Petitprez F., Selves J., Laurent-Puig P., Sautès-Fridman C., Fridman W.H., de Reyniès A. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17:218.
    1. Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B. 1995;57:289–300.
    1. Bindea G., Mlecnik B., Tosolini M., Kirilovsky A., Waldner M., Obenauf A.C., Angell H., Fredriksen T., Lafontaine L., Berger A. Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity. 2013;39:782–795.
    1. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34:525–527.
    1. Brodie T., Brenna E., Sallusto F. OMIP-018: chemokine receptor expression on human T helper cells. Cytometry A. 2013;83:530–532.
    1. Brodie T., Rothaeusler K., Sospedra M. OMIP-033: a comprehensive single step staining protocol for human T- and B-cell subsets. Cytometry A. 2016;89:629–632.
    1. Bullard J.H., Purdom E., Hansen K.D., Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinformatics. 2010;11:94.
    1. Calderon D., Nguyen M.L.T., Mezger A., Kathiria A., Nguyen V., Lescano N., Wu B., Trombetta J., Ribado J.V., Knowles D.A. Landscape of stimulation-responsive chromatin across diverse human immune cells. bioRxiv. 2018
    1. Corkum C.P., Ings D.P., Burgess C., Karwowska S., Kroll W., Michalak T.I. Immune cell subsets and their gene expression profiles from human PBMC isolated by Vacutainer Cell Preparation Tube (CPT™) and standard density gradient. BMC Immunol. 2015;16:48.
    1. Crotty S. Follicular helper CD4 T cells (TFH) Annu. Rev. Immunol. 2011;29:621–663.
    1. de Mello V.D.F., Kolehmanien M., Schwab U., Pulkkinen L., Uusitupa M. Gene expression of peripheral blood mononuclear cells as a tool in dietary intervention studies: what do we know so far? Mol. Nutr. Food Res. 2012;56:1160–1172.
    1. Eisenberg E., Levanon E.Y. Human housekeeping genes, revisited. Trends Genet. 2013;29:569–574.
    1. Ewels P., Magnusson M., Lundin S., Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048.
    1. Fabregat A., Sidiropoulos K., Garapati P., Gillespie M., Hausmann K., Haw R., Jassal B., Jupe S., Korninger F., McKay S. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44(D1):D481–D487.
    1. Fink K. Origin and function of circulating plasmablasts during acute viral infections. Front. Immunol. 2012;3:78.
    1. Gong T., Szustakowski J.D. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics. 2013;29:1083–1085.
    1. Gong T., Hartmann N., Kohane I.S., Brinkmann V., Staedtler F., Letzkus M., Bongiovanni S., Szustakowski J.D. Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples. PLoS ONE. 2011;6:e27156.
    1. Gu Z., Eils R., Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849.
    1. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–1774.
    1. Hsiao L.L., Dangond F., Yoshida T., Hong R., Jensen R.V., Misra J., Dillon W., Lee K.F., Clark K.E., Haverty P. A compendium of gene expression in normal human tissues. Physiol. Genomics. 2001;7:97–104.
    1. Hu P., Zhang W., Xin H., Deng G. Single cell isolation and analysis. Front. Cell Dev. Biol. 2016;4:116.
    1. Ivell R., Teerds K., Hoffman G.E. Proper application of antibodies for immunohistochemical detection: antibody crimes and how to prevent them. Endocrinology. 2014;155:676–687.
    1. Javierre B.M., Burren O.S., Wilder S.P., Kreuzhuber R., Hill S.M., Sewitz S., Cairns J., Wingett S.W., Várnai C., Thiecke M.J., BLUEPRINT Consortium Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167:1369–1384.e19.
    1. Johnson W.E., Li C., Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127.
    1. Kalyan S., Kabelitz D. Defining the nature of human γδ T cells: a biographical sketch of the highly empathetic. Cell. Mol. Immunol. 2013;10:21–29.
    1. Kingsley P.D., Greenfest-Allen E., Frame J.M., Bushnell T.P., Malik J., McGrath K.E., Stoeckert C.J., Palis J. Ontogeny of erythroid gene expression. Blood. 2013;121:e5–e13.
    1. Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
    1. Langfelder P., Zhang B., Horvath S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics. 2008;24:719–720.
    1. Leipold M.D., Newell E.W., Maecker H.T. Multiparameter phenotyping of human PBMCs using mass cytometry. Methods Mol. Biol. 2015;1343:81–95.
    1. Li B., Ruotti V., Stewart R.M., Thomson J.A., Dewey C.N. RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010;26:493–500.
    1. Li Q., Zhang X., Peng Y., Chai H., Xu Y., Wei J., Ren X., Wang X., Liu W., Chen M., Huang D. Comparison of the sorting efficiency and influence on cell function between the sterile flow cytometry and immunomagnetic bead purification methods. Prep. Biochem. Biotechnol. 2013;43:197–206.
    1. Liu W., Putnam A.L., Xu-Yu Z., Szot G.L., Lee M.R., Zhu S., Gottlieb P.A., Kapranov P., Gingeras T.R., Fazekas de St Groth B. CD127 expression inversely correlates with FoxP3 and suppressive function of human CD4+ T reg cells. J. Exp. Med. 2006;203:1701–1711.
    1. Lu P., Nakorchevskiy A., Marcotte E.M. Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc. Natl. Acad. Sci. U S A. 2003;100:10370–10375.
    1. Maecker H.T., McCoy J.P., Nussenblatt R. Standardizing immunophenotyping for the Human Immunology Project. Nat. Rev. Immunol. 2012;12:191–200.
    1. Mahnke Y.D., Beddall M.H., Roederer M. OMIP-017: human CD4(+) helper T-cell subsets including follicular helper cells. Cytometry A. 2013;83:439–440.
    1. Mahnke Y.D., Beddall M.H., Roederer M. OMIP-015: human regulatory and activated T-cells without intracellular staining. Cytometry A. 2013;83:179–181.
    1. Marshall N.B., Swain S.L. Cytotoxic CD4 T cells in antiviral immunity. J. Biomed. Biotechnol. 2011;2011:954602.
    1. Maza E. In papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA-seq experimental design. Front. Genet. 2016;7:164.
    1. Miao Y.-L., Xiao Y.-L., Du Y., Duan L.-P. Gene expression profiles in peripheral blood mononuclear cells of ulcerative colitis patients. World J. Gastroenterol. 2013;19:3339–3346.
    1. Mohanty S., Joshi S.R., Ueda I., Wilson J., Blevins T.P., Siconolfi B., Meng H., Devine L., Raddassi K., Tsang S. Prolonged proinflammatory cytokine production in monocytes modulated by interleukin 10 after influenza vaccination in older adults. J. Infect. Dis. 2015;211:1174–1184.
    1. Monaco G., Chen H., Poidinger M., Chen J., de Magalhães J.P., Larbi A. flowAI: automatic and interactive anomaly discerning tools for flow cytometry data. Bioinformatics. 2016;32:2473–2480.
    1. Newman A.M., Liu C.L., Green M.R., Gentles A.J., Feng W., Xu Y., Hoang C.D., Diehn M., Alizadeh A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457.
    1. Novershtern N., Subramanian A., Lawton L.N., Mak R.H., Haining W.N., McConkey M.E., Habib N., Yosef N., Chang C.Y., Shay T. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell. 2011;144:296–309.
    1. Paradis E., Claude J., Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20:289–290.
    1. Pennock N.D., White J.T., Cross E.W., Cheney E.E., Tamburini B.A., Kedl R.M. T cell responses: naïve to memory and everything in between. Adv. Physiol. Educ. 2013;37:273–283.
    1. Picelli S., Faridani O.R., Björklund Å.K., Winberg G., Sagasser S., Sandberg R. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 2014;9:171–181.
    1. Risso D., Schwartz K., Sherlock G., Dudoit S. GC-content normalization for RNA-seq data. BMC Bioinformatics. 2011;12:480.
    1. Risso D., Ngai J., Speed T.P., Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 2014;32:896–902.
    1. Ritchie M.E., Phipson B., Wu D., Hu Y., Law C.W., Shi W., Smyth G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
    1. Robinson M.D., Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25.
    1. Shen-Orr S.S., Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr. Opin. Immunol. 2013;25:571–578.
    1. Soneson C., Love M.I., Robinson M.D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 2016;4:1521.
    1. Tanenbaum M.E., Stern-Ginossar N., Weissman J.S., Vale R.D. Regulation of mRNA translation during mitosis. eLife. 2015;4:e07957.
    1. Tirosh I., Izar B., Prakadan S.M., Wadsworth M.H., 2nd, Treacy D., Trombetta J.J., Rotem A., Rodman C., Lian C., Murphy G. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189–196.
    1. Vallania F., Tam A., Lofgren S., Schaffert S., Azad T.D., Bongen E., Alsup M., Alonso M., Davis M., Engleman E. Leveraging heterogeneity across multiple data sets increases accuracy of cell-mixture deconvolution and reduces biological and technical biases. bioRxiv. 2017
    1. Vallania F., Tam A., Lofgren S., Schaffert S., Azad T.D., Bongen E., Haynes W., Alsup M., Alonso M., Davis M. Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases. Nat. Commun. 2018;9:4735.
    1. van Leeuwen D.M., Gottschalk R.W.H., van Herwijnen M.H., Moonen E.J., Kleinjans J.C.S., van Delft J.H.M. Differential gene expression in human peripheral blood mononuclear cells induced by cigarette smoke and its constituents. Toxicol. Sci. 2005;86:200–210.
    1. Villani A.-C., Satija R., Reynolds G., Sarkizova S., Shekhar K., Fletcher J., Griesbeck M., Butler A., Zheng S., Lazo S. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science. 2017;356:eaah4573.
    1. Wickham H. Springer; 2009. Ggplot2: Elegant Graphics for Data Analysis.
    1. Willinger T., Freeman T., Hasegawa H., McMichael A.J., Callan M.F.C. Molecular signatures distinguish human central memory from effector memory CD8 T cell subsets. J. Immunol. 2005;175:5895–5903.
    1. Zhang H.-M., Liu T., Liu C.-J., Song S., Zhang X., Liu W., Jia H., Xue Y., Guo A.-Y. AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors. Nucleic Acids Res. 2015;43:D76–D81.
    1. Ziegler-Heitbrock L., Ancuta P., Crowe S., Dalod M., Grau V., Hart D.N., Leenen P.J.M., Liu Y.-J., MacPherson G., Randolph G.J. Nomenclature of monocytes and dendritic cells in blood. Blood. 2010;116:e74–e80.
    1. Zimmermann M.T., Oberg A.L., Grill D.E., Ovsyannikova I.G., Haralambieva I.H., Kennedy R.B., Poland G.A. System-wide associations between DNA-methylation, gene expression, and humoral immune response to influenza vaccination. PLoS ONE. 2016;11:e0152034.

Source: PubMed

3
Prenumerera