Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types

Vincent van Unen, Thomas Höllt, Nicola Pezzotti, Na Li, Marcel J T Reinders, Elmar Eisemann, Frits Koning, Anna Vilanova, Boudewijn P F Lelieveldt, Vincent van Unen, Thomas Höllt, Nicola Pezzotti, Na Li, Marcel J T Reinders, Elmar Eisemann, Frits Koning, Anna Vilanova, Boudewijn P F Lelieveldt

Abstract

Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for the data analysis. In particular, dimensionality reduction-based techniques like t-SNE offer single-cell resolution but are limited in the number of cells that can be analyzed. Here we introduce Hierarchical Stochastic Neighbor Embedding (HSNE) for the analysis of mass cytometry data sets. HSNE constructs a hierarchy of non-linear similarities that can be interactively explored with a stepwise increase in detail up to the single-cell level. We apply HSNE to a study on gastrointestinal disorders and three other available mass cytometry data sets. We find that HSNE efficiently replicates previous observations and identifies rare cell populations that were previously missed due to downsampling. Thus, HSNE removes the scalability limit of conventional t-SNE analysis, a feature that makes it highly suitable for the analysis of massive high-dimensional data sets.

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Fig. 1
Fig. 1
Schematic overview of Cytosplore+HSNE for exploring the mass cytometry data. By creating a multi-level hierarchy of an illustrative 3D data set (a), we achieve a clear separation of different cell groups in an overview embedding (left panel b) that conserves non-linear relationships (i.e., follows the distance indicated by the dashed line in a, instead of the grey arrow) and more detail within the separate groups on the data level (right panel b). c Construction and exploration of the hierarchy. The hierarchy is constructed starting with the data level (left two columns). On the basis of the high-dimensional expression patterns of the cells, a weighted kNN graph is constructed, which is used to find representative cells used as landmarks in the next coarser level. By administering the area of influence (AoI) of the landmarks, cells/landmarks can be aggregated without losing the global structure of the underlying data or creating shortcuts. The exploration of the hierarchy is shown in the two rightmost columns. At the bottom, we see the overview level (in this example the 3rd level in the hierarchy), which shows that a group of landmarks has low expression in marker c (bottom-right panel). Selecting this group of landmarks for further exploration results in a look-up of the landmarks in the preceding level (neighborhood graph, intermediate level) that are in the AoI, with which a new embedding can be created at the 2nd level of the hierarchy (middle-right panel). Marker b shows a strong separation between the upper and lower landmarks at this level. Zooming-in on the landmarks with low expression of marker b reveals further separation in marker a at the lowest level, the full data level (top-right panel)
Fig. 2
Fig. 2
Gain of information by analyzing the mass cytometry data at full resolution with Cytosplore+HSNE. a Pie chart showing cellular composition of the mass cytometry data set. Color represents the subsets (N = 142), as identified in our previous study. Black represents the cells discarded by stochastic downsampling and grey represents the cells discarded by ACCENSE clustering. b Embeddings of the 1.1 million cells annotated in ref showing the top three levels of the HSNE-hierarchy (five levels in total). Color represents annotations as in a. Size of the landmarks is proportional to the number of cells in the AoI that each landmark represents. Bottom map shows density features depicting the local probability density of cells for the level 3 embedding, where black dots indicate the centroids of identified cluster partitions using GMS clustering. c Embeddings of all 5.2 million cells, again showing only the top three levels of the hierarchy (five levels in total). Colors as in a. Right panels visualize landmarks representing cells discarded by stochastic downsampling (black) and the cells discarded by ACCENSE (grey). Bottom map shows density features for the level 3 embedding as described in (b). d Frequency of annotated cells for 145 clusters identified by Cytosplore+HSNE at the third hierarchical level using GMS clustering in c. Color coding as in a
Fig. 3
Fig. 3
Analysis of the CD7+CD3− innate lymphocyte compartment in inflammatory intestinal diseases. a First HSNE level embedding of 5.2 million cells. Color represents arcsin5-transformed marker expression as indicated. Size of the landmarks represents AoI. Blue encirclement indicates selection of landmarks representing CD7+CD3− innate lymphocytes and CD4+ T cells further discussed in Fig. 5. b The major immune lineages, annotated on the basis of lineage marker expression. c Third HSNE level embedding of the CD7+CD3− innate lymphocytes (5.0 × 105 cells). Color represents arcsin5-transformed marker expression in top panels, and tissue-origin and clinical features in bottom panels. Blue encirclement indicates selection of landmarks representing CD127+ILC and ILC-like cells. d Third HSNE level embedding shows density features depicting the local probability density of cells, where black dots indicate the centroids of identified cluster partitions using GMS clustering. e Embedding of the CD127+ILC and ILC-like cells (6.0 × 104 cells) at single-cell resolution. Arrows indicate ILC1 (blue), ILC2 (orange) and ILC3 (green). Bottom-right panel shows corresponding cluster partitions using GMS clustering based on density features (top-right panel). f A heatmap summary of median expression values (same color coding as for the embeddings) of cell markers expressed by CD127 + ILC and ILC-like clusters identified in b and hierarchical clustering thereof. g Composition of cells for each cluster is represented graphically by a horizontal bar in which segment lengths represent the proportion of cells with: (left) tissue-of-origin, (middle) disease status and (right) sampling status
Fig. 4
Fig. 4
CD127+ILC and ILC-like subsets identified by Cytosplore+HSNE. Table showing cluster number, distinguishing phenotypic marker expression profiles and biological annotation for the clusters identified in Fig. 3e. Black color indicates clusters described in previous reports and red color additional unknown clusters. Hierarchical clustering of clusters based on marker expression profile shown in the heatmap depicted in Fig. 3f
Fig. 5
Fig. 5
Analysis of the CD4+ T-cell compartment in inflammatory intestinal diseases. a Third HSNE level embedding of the CD4+ T cells (1.4 × 106 cells, selected in Fig. 3). Color and size of landmarks as described in Fig. 3. Right panel shows density features for the level 3 embedding. Blue encirclement indicates selection of landmarks representing CD28−CD4+ T cells. b Embedding of the CD28−CD4+ T cells (2.6 × 104 cells) at single-cell resolution. Bottom-left panel shows yellow and black dashed encirclements based on CD56− and CD56+ expression, respectively. Three bottom-right panels show cells colored according to: (left) from subjects with different disease status (CeD, Crohn, EATLII, RCDII, and controls), (middle) sampling status (annotated subset, discarded by ACCENSE and downsampled) and (right) tissue-of-origin (blood and intestine)

References

    1. Saeys Y, Gassen SV, Lambrecht BN. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 2016;16:449–462. doi: 10.1038/nri.2016.56.
    1. Qiu P, et al. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat. Biotechnol. 2011;29:886–891. doi: 10.1038/nbt.1991.
    1. Zunder ER, Lujan E, Goltsev Y, Wernig M, Nolan GP. A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry. Cell Stem Cell. 2015;16:323–337. doi: 10.1016/j.stem.2015.01.015.
    1. Levine JH, et al. Data-Driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell. 2015;162:184–197. doi: 10.1016/j.cell.2015.05.047.
    1. Samusik N, Good Z, Spitzer MH, Davis KL, Nolan GP. Automated mapping of phenotype space with single-cell data. Nat. Methods. 2016;13:493–496. doi: 10.1038/nmeth.3863.
    1. Spitzer MH, et al. IMMUNOLOGY. An interactive reference framework for modeling a dynamic immune system. Science. 2015;349:1259425. doi: 10.1126/science.1259425.
    1. Hotelling, H. Analysis of a complex of statistical variables into principal components. J Ed. Psychol. 24, 417–441 (1933).
    1. van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res.9, 2579–2605 (2008).
    1. Amir EAD, et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 2013;31:545–552. doi: 10.1038/nbt.2594.
    1. Haghverdi L, Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. 2015;31:2989–2998. doi: 10.1093/bioinformatics/btv325.
    1. Bendall SC, Nolan GP, Roederer M, Chattopadhyay PK. A deep profiler’s guide to cytometry. Trends Immunol. 2012;33:323–332. doi: 10.1016/j.it.2012.02.010.
    1. Chattopadhyay PK, Gierahn TM, Roederer M, Love JC. Single-cell technologies for monitoring immune systems. Nat. Immunol. 2014;15:128–135. doi: 10.1038/ni.2796.
    1. Pezzotti N, Höllt T, Lelieveldt B, Eisemann E, Vilanova A. Hierarchical Stochastic Neighbor Embedding. Comput. Graph. Forum. 2016;35:21–30. doi: 10.1111/cgf.12878.
    1. van Unen V, et al. Mass cytometry of the human mucosal immune system identifies tissue- and disease-associated immune subsets. Immunity. 2016;44:1227–1239. doi: 10.1016/j.immuni.2016.04.014.
    1. van der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).
    1. Pezzotti N, et al. Approximated and user steerable tSNE for progressive visual analytics. IEEE. Trans. Vis. Comput. Graph. 2016;23:1739–1752. doi: 10.1109/TVCG.2016.2570755.
    1. Setty M, et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 2016;34:637–645. doi: 10.1038/nbt.3569.
    1. Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE. Trans. Pattern Anal. Mach. Intell. 2002;24:603–619. doi: 10.1109/34.1000236.
    1. Spits H, Cupedo T. Innate lymphoid cells: emerging insights in development, lineage relationships, and function. Annu. Rev. Immunol. 2012;30:647–675. doi: 10.1146/annurev-immunol-020711-075053.
    1. McKenzie ANJ, Spits H, Eberl G. Innate lymphoid cells in inflammation and immunity. Immunity. 2014;41:366–374. doi: 10.1016/j.immuni.2014.09.006.
    1. Spits H, et al. Innate lymphoid cells--a proposal for uniform nomenclature. Nat. Rev. Immunol. 2013;13:145–149. doi: 10.1038/nri3365.
    1. Robinette ML, et al. Transcriptional programs define molecular characteristics of innate lymphoid cell classes and subsets. Nat. Immunol. 2015;16:306–317. doi: 10.1038/ni.3094.
    1. Schmitz F, et al. Identification of a potential physiological precursor of aberrant cells in refractory coeliac disease type II. Gut. 2013;62:509–519. doi: 10.1136/gutjnl-2012-302265.
    1. Schmitz F, et al. The composition and differentiation potential of the duodenal intraepithelial innate lymphocyte compartment is altered in coeliac disease. Gut. 2016;65:1269–1278. doi: 10.1136/gutjnl-2014-308153.
    1. Ettersperger J, et al. Interleukin-15-dependent T-cell-like innate intraepithelial lymphocytes develop in the intestine and transform into lymphomas in celiac disease. Immunity. 2016;45:610–625. doi: 10.1016/j.immuni.2016.07.018.
    1. Mou D, Espinosa J, Lo DJ, Kirk AD. CD28 negative T cells: is their loss our gain? Am. J. Transplant. 2014;14:2460–2466. doi: 10.1111/ajt.12937.
    1. Bendall SC, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011;332:687–696. doi: 10.1126/science.1198704.
    1. Shaham, U. & Steinerberger, S. Stochastic neighbor embedding separates well-separated clusters. arXiv:1702.02670 [] (2017).
    1. Höllt T, et al. Cytosplore: Interactive immune cell phenotyping for large single-cell datasets. Comput. Graph. Forum. 2016;35:171–180. doi: 10.1111/cgf.12893.
    1. Finck R, et al. Normalization of mass cytometry data with bead standards. Cytometry A. 2013;83:483–494. doi: 10.1002/cyto.a.22271.

Source: PubMed

3
Abonneren