A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB

Peter Kent, Rikke K Jensen, Alice Kongsted, Peter Kent, Rikke K Jensen, Alice Kongsted

Abstract

Background: There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA).

Methods: The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known.

Results: The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups.

Conclusions: Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.

Figures

Figure 1
Figure 1
Dataset A1 (n = 1000) - containing 3 subgroups, whose distinguishing features do not overlap, with characteristics scored on a mixture of continuous and dichotomous variables.
Figure 2
Figure 2
Dataset A2 (n = 1000) - containing 3 subgroups, whose distinguishing features do overlap, with all characteristics scored on continuous variables.
Figure 3
Figure 3
Dataset A3 (n = 1000)- containing 6 subgroups, whose distinguishing features do overlap, with characteristics scored on a mixture of continuous and dichotomous variables.
Figure 4
Figure 4
Dataset A4 (n = 1000) - containing 3 subgroups, whose distinguishing features do overlap, with all characteristics scored on continuous variables. Contains 10 ’pure noise’ non-discriminatory variables.
Figure 5
Figure 5
Illustration of classification overlap of subgroups.
Figure 6
Figure 6
Classification disagreement of individuals (disc levels or patients).

References

    1. Hill JC, Whitehurst DG, Lewis M, Bryan S, Dunn K, Foster NE, Konstantinou K, Main CJ, Mason E, Somerville S, Sowden G, Vohora K, Hay EM. Comparison of stratified primary care management for low back pain with current best practice (STarT Back): a randomised controlled trial. Lancet. 2011;378(9802):1560–1571. doi: 10.1016/S0140-6736(11)60937-9.
    1. Hingorani AD, Windt DA, Riley RD, Abrams K, Moons KG, Steyerberg EW, Schroter S, Sauerbrei W, Altman DG, Hemingway H. Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ. 2013;346:e5793. doi: 10.1136/bmj.e5793.
    1. Lim SS, Vos T, Flaxman AD, Danaei G, Shibuya K, Adair-Rohani H, Amann M, Anderson HR, Andrews KG, Aryee M, Atkinson C, Bacchus LJ, Bahalim AN, Balakrishnan K, Balmes J, Barker-Collo S, Baxter A, Bell ML, Blore JD, Blyth F, Bonner C, Borges G, Bourne R, Boussinesq M, Brauer M, Brooks P, Bruce NG, Brunekreef B, Bryan-Hancock C, Bucello C, et al. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2224–2260. doi: 10.1016/S0140-6736(12)61766-8.
    1. Jensen RK, Jensen TS, Kjaer P, Kent P. Can pathoanatomical pathways of degeneration in lumbar motion segments be identified by clustering MRI findings. BMC Musculoskelet Disord. 2013;14(1):198. doi: 10.1186/1471-2474-14-198.
    1. Takatalo J, Karppinen J, Niinimaki J, Taimela S, Mutanen P, Sequeiros RB, Nayha S, Jarvelin MR, Kyllonen E, Tervonen O. Association of modic changes, Schmorl's nodes, spondylolytic defects, high-intensity zone lesions, disc herniations, and radial tears with low back symptom severity among young Finnish adults. Spine. 2012;37(14):1231–1239. doi: 10.1097/BRS.0b013e3182443855.
    1. Barban N, Billari FC. Classifying life course trajectories: a comparison of latent class and sequence analysis. J R Stat Soc. 2012;61(5):765–784. doi: 10.1111/j.1467-9876.2012.01047.x.
    1. Axen I, Bodin L, Bergstrom G, Halasz L, Lange F, Lovgren PW, Rosenbaum A, Leboeuf-Yde C, Jensen I. Clustering patients on the basis of their individual course of low back pain over a six month period. BMC Musculoskelet Disord. 2011;12:99. doi: 10.1186/1471-2474-12-99.
    1. Kent P, Keating JL, Leboeuf-Yde C. Research methods for subgrouping low back pain. BMC Med Res Methodol. 2010;10:62. doi: 10.1186/1471-2288-10-62.
    1. Klebanoff MA. Subgroup analysis in obstetrics clinical trials. Am J Obstet Gynecol. 2007;197:119–122. doi: 10.1016/j.ajog.2007.02.030.
    1. Flynn T, Fritz JW, Whitman M, Wainner RS, Magel J, Rendeiro D, Butler B, Garber M, Allison S. A clinical prediction rule for classifying patients with low back pain who demonstrate short-term improvement with spinal manipulation. Spine. 2002;27(24):2835–2843. doi: 10.1097/00007632-200212150-00021.
    1. Beneciuk JM, Robinson ME, George SZ. Low back pain subgroups using fear-avoidance model measures: results of a cluster analysis. Clin J Pain. 2012;28(8):658–666. doi: 10.1097/AJP.0b013e31824306ed.
    1. Bacher J, Wenzig K, Vogler M. Work and discussion paper. Erlangen-Nuremberg, Germany: Department of Sociology, Social Science Institute, Friedrich-Alexander-University; 2004. SPSS TwoStep Cluster – a first evaluation; pp. 1–30.
    1. Gelbard R, Goldman O, Spiegler I. Investigating diversity of clustering methods: An empirical comparison. Data Knowl Eng. 2007;63:155–166. doi: 10.1016/j.datak.2007.01.002.
    1. Magidson J, Vermunt JK. Latent class models for clustering: A comparison with k-means. Can J Market Res. 2002;20:1–9.
    1. Haughton D, Legrand P, Woolford S. Review of three Latent Class Cluster Analysis packages: Latent GOLD, poLCA, and MCLUST. Am Stat. 2009;63(1):81–91. doi: 10.1198/tast.2009.0016.
    1. SPSS . SPSS Base 17.0 Users guide. Chicago, IL, USA: SPSS Inc; 2009.
    1. Vermunt JK, Magidson J. Latent Gold 4.0 users's guide. Belmont, Massachusetts, USA: Statistical Innovations Inc; 2005.
    1. Wallace CS. Statistical and inductive inference by minimum message length. New York, USA: Springer; 2005.
    1. Wallace CS, Boulton DM. An information measure for classification. Comput J. 1968;11(2):185–194. doi: 10.1093/comjnl/11.2.185.
    1. Wallace CS, Dowe DL. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Stat Comput. 2000;10(1):73–83. doi: 10.1023/A:1008992619036.
    1. Kjaer P, Korsholm L, Bendix T, Sorensen JS, Leboeuf-Yde C. Modic changes and their associations with clinical findings. Eur Spine J. 2006;15:1312–1319. doi: 10.1007/s00586-006-0185-x.
    1. Jensen TS, Sorensen JS, Kjaer P. Intra- and interobserver reproducibility of vertebral endplate signal (modic) changes in the lumbar spine: The Nordic modic consensus group classification. Acta Radiol. 2007;48:748–754. doi: 10.1080/02841850701422112.
    1. Jensen RK, Leboeuf-Yde C, Wedderkopp N, Sorensen JS, Manniche C. Rest versus exercise as treatment for patients with low back pain and Modic changes. A randomized controlled clinical trial. BMC Med. 2012;10:22. doi: 10.1186/1741-7015-10-22.
    1. Albert HB, Briggs AM, Kent P, Byrhagen A, Hansen C, Kjaergaard K. The prevalence of MRI-defined spinal pathoanatomies and their association with modic changes in individuals seeking care for low back pain. Eur Spine J. 2011;20(8):1355–1362. doi: 10.1007/s00586-011-1794-6.
    1. Kent P, Briggs AM, Albert HB, Byrhagen A, Hansen C, Kjaergaard K, Jensen TS. Inexperienced clinicians can extract pathoanatomic information from MRI narrative reports with high reproducibility for use in research/quality assurance. Chiropr Man Therap. 2011;19(1):16. doi: 10.1186/2045-709X-19-16.
    1. Eirikstoft H, Kongsted A. Patient characteristics in low back pain subgroups based on an existing classification system. A descriptive cohort study in chiropractic practice. Man Ther. 2014;19(1):65–71. doi: 10.1016/j.math.2013.07.007.
    1. Kent P, Kongsted A. Identifying clinical course patterns in SMS data using cluster analysis. Chiropr Man Therap. 2012;20(1):20. doi: 10.1186/2045-709X-20-20.
    1. Kongsted A, Johannesen E, Leboeuf-Yde C. Feasibility of the STarT back screening tool in chiropractic clinics: a cross-sectional study of patients with low back pain. Chiropr Man Therap. 2011;19:10. doi: 10.1186/2045-709X-19-10.
    1. Eshghi A, Haughton D, Legrand P, Skaletsky M, Woolford S. Identifying groups: A comparison of methodologies. J Data Sci. 2011;9:271–291.
    1. Twisk J, Hoekstra T. Classifying developmental trajectories over time should be done with great caution: a comparison between methods. J Clin Epidemiol. 2012;65(10):1078–1087. doi: 10.1016/j.jclinepi.2012.04.010.
Pre-publication history
    1. The pre-publication history for this paper can be accessed here:

Source: PubMed

3
Abonner