A new method for constructing networks from binary data

Claudia D van Borkulo, Denny Borsboom, Sacha Epskamp, Tessa F Blanken, Lynn Boschloo, Robert A Schoevers, Lourens J Waldorp, Claudia D van Borkulo, Denny Borsboom, Sacha Epskamp, Tessa F Blanken, Lynn Boschloo, Robert A Schoevers, Lourens J Waldorp

Abstract

Network analysis is entering fields where network structures are unknown, such as psychology and the educational sciences. A crucial step in the application of network models lies in the assessment of network structure. Current methods either have serious drawbacks or are only suitable for Gaussian data. In the present paper, we present a method for assessing network structures from binary data. Although models for binary data are infamous for their computational intractability, we present a computationally efficient model for estimating network structures. The approach, which is based on Ising models as used in physics, combines logistic regression with model selection based on a Goodness-of-Fit measure to identify relevant relationships between variables that define connections in a network. A validation study shows that this method succeeds in revealing the most relevant features of a network for realistic sample sizes. We apply our proposed method to estimate the network of depression and anxiety symptoms from symptom scores of 1108 subjects. Possible extensions of the model are discussed.

Figures

Figure 1. Examples of networks with 30…
Figure 1. Examples of networks with 30 nodes in the simulation study.
(a) Generated networks. From left to right: random network (probability of an extra connection is 0.1), scale-free network (power of preferential attachment is 1) and small world network (rewiring probability is 0.1). (b) Weighted versions of (a) that are used to generate data (true networks). (c) Estimated networks.
Figure 2. Mean correlations (vertical axes) of…
Figure 2. Mean correlations (vertical axes) of the upper triangles of the weighted adjacency matrices of true and estimated networks of 100 simulations with random, scale-free, and small world networks for sample sizes ssize = 100, 500, 1000, and 2000, with number of nodes nnodes = 10, 20, 30, and 100.
We used three levels of connectivity (random networks: probability of an extra connection Pconn = .1, .2, and .3; scale-free networks: power of preferential attachment Pattach = 1, 2, and 3; small world networks: rewiring probability of Prewire = .1, .5, and 1). For the condition with 100 nodes, we used different levels of connectivity for random and scale-free networks in order to obtain more realistic networks (random networks: Pconn = .05, .1, and .15; scale-free networks: Pattach = 1, 1.25, and 1.5).
Figure 3. Application of eLasso to real…
Figure 3. Application of eLasso to real data.
The resulting network structure of a group of healthy controls and people with a current or history of depressive disorder (N = 1108). Cognitive symptoms are displayed as ○ and thicker edges (connections) represent stronger associations.
Figure 4. Three centrality measures of the…
Figure 4. Three centrality measures of the nodes in the network based on real data.
From left to right: node strength, betweenness, and clustering coefficient. “Hypersomnia” (hyp) has no clustering coefficient, since it has only one neighbour.
Figure 5. Community structure of the network…
Figure 5. Community structure of the network based on real data, detected by the Walktrap algorithm.

References

    1. Barabási A. L. The network takeover. Nat. Phys. 8, 14–16 (2012).
    1. Barzel B. & Barabási A. L. Universality in network dynamics. Nat. Phys. 9, 673–681 (2013).
    1. Kitsak M. et al. Identification of influential spreaders in complex networks. Nat. Phys. 6, 888–893 (2010).
    1. Liu Y. Y., Slotine J. J. & Barabási A.-L. Controllability of complex networks. Nature 473, 167–173 (2011).
    1. Vespignani A. Modelling dynamical processes in complex socio-technical systems. Nat. Phys. 8, 32–39 (2012).
    1. Borsboom D. Psychometric perspectives on diagnostic systems. J. Clin. Psychol. 64, 1089–1108 (2008).
    1. Borsboom D. & Cramer A. O. J. Network analysis: An integrative approach to the structure of psychopathology. Annu. Rev. Clin. Psychol. 9, 91–121 (2013).
    1. Cramer A. O. J., Waldorp L. J., Van Der Maas H. L. J. & Borsboom D. Comorbidity: A network perspective. Behav. Brain Sci. 33, 137–150 (2010).
    1. Schmittmann V. D. et al. Deconstructing the construct: A network perspective on psychological phenomena. New Ideas Psychol. 31, 43–53 (2011).
    1. Van Der Maas H. L. J. et al. A dynamical model of general intelligence: The positive manifold of intelligence by mutualism. Psychol. Rev. 113, 842 (2006).
    1. Schäfer J. & Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Molec. Biol. 4, 32 (2005).
    1. Bickel P. J. & Levina E. Covariance regularization by thresholding. Ann. Stat. 36, 2577–2604 (2008).
    1. Friedman J., Hastie T. & Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).
    1. Kalisch M., Mächler M., Colombo D., Maathuis M. H. & Bühlmann P. Causal inference using graphical models with the r package pcalg. J. Stat. Softw. 47, 1–26 (2012).
    1. Spirtes P., Glymour C. & Scheines R. Causation, Prediction, and Search (MIT press, Cambrigde, Massachusetts, 2001).
    1. Drton M. & Perlman M. Multiple testing and error control in gaussian graphical model selection. Statist. Sci. 22, 430–449 (2007).
    1. Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Am. Statist. Assoc. 99, 96–104 (2004).
    1. Strimmer K. fdrtool: a versatile r package for estimating local and tail area-based false discovery rates. Bioinformatics 24, 1461–1462 (2008).
    1. Kindermann R. & Snell J. L. Markov Random Fields and their Applications, vol. 1 (American Mathematical Society Providence, RI, 1980).
    1. Lauritzen S. Graphical Models (Oxford University Press, USA, 1996).
    1. Speed T. & Kiiveri H. Gaussian markov distributions over finite graphs. Ann. Stat. 14, 138–150 (1986).
    1. Foygel R. & Drton M. Extended bayesian information criteria for gaussian graphical models. Adv. Neural Inf. Process. Syst. 23, 2020–2028 (2010).
    1. Ravikumar P., Wainwright M. J., Raskutti G. & Yu B. et al. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electron. J. Statist. 5, 935–980 (2011).
    1. Tibshirani R. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58, 267–288 (1996).
    1. Meinshausen N. & Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006).
    1. Ravikumar P., Wainwright M. J. & Lafferty J. D. High-dimensional ising model selection using l1-regularized logistic regression. Ann. Stat. 38, 1287–1319 (2010).
    1. Ising E. Beitrag zur theorie des ferromagnetismus. Z. Phys. A-Hadrons. Nucl. 31, 253–258 (1925).
    1. Chen J. & Chen Z. Extended bayesian information criteria for model selection with large model spaces. Biometrika 95, 759–771 (2008).
    1. Foygel R. & Drton M. High-dimensional ising model selection with bayesian information criteria. arXiv preprint arXiv:1403.3374 (2014).
    1. Barabási A. L. & Albert R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
    1. Erdös P. & Rényi A. On random graphs. Publ. Math. Debrecen 6, 290–297 (1959).
    1. Watts D. J. & Strogatz S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
    1. Foygel R. & Drton M. Bayesian model choice and information criteria in sparse generalized linear models. arXiv preprint arXiv:1112.5635 (2011).
    1. Jardine N. & van Rijsbergen C. J. The use of hierarchic clustering in information retrieval. Inform. Storage Ret. 7, 217–240 (1971).
    1. Rush A. et al. The inventory of depressive symptomatology (IDS): Psychometric properties. Psychol. Med. 26, 477–486 (1996).
    1. Penninx B. W. et al. The netherlands study of depression and anxiety (NESDA): Rationale, objectives and methods. Int. J. Method. Psych. 17, 121–140 (2008).
    1. Barrat A., Barthelemy M., Pastor-Satorras R. & Vespignani A. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. U.S.A. 101, 3747–3752 (2004).
    1. Boccaletti S., Latora V., Moreno Y., Chavez M. & Hwang D. Complex networks: Structure and dynamics. Phys. Rep. 424, 175–308 (2006).
    1. Opsahl T., Agneessens F. & Skvoretz J. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Networks 32, 245–251 (2010).
    1. Goldberg D. & Fawcett J. The importance of anxiety in both major depression and bipolar disorder. Depress. Anxiety 29, 471–478 (2012).
    1. Kessler R. C., Nelson C. B., McGonagle K. A. & Liu J. et al. Comorbidity of DSM-III—R major depressive disorder in the general population: Results from the US National Comorbidity Survey. Br. J. Psychiatry 30, 17–30 (1996).
    1. Schoevers R. A., Beekman A. T. F., Deeg D. J. H., Jonker C. & Van Tilburg W. Comorbidity and risk-patterns of depression, generalised anxiety disorder and mixed anxiety-depression in later life: results from the amstel study. Int. J. Geriatr. 18, 994–1001 (2003).
    1. American Psychiatric Association. . The Diagnostic and Statistical Manual of Mental Disorders (5th ed.) (Arlington, VA: American Psychiatric Publishing, 2013).
    1. Orman G. K. & Labatut V. A Comparison of Community Detection Algorithms on Artificial Networks (Springer, Berlin Heidelberg, 2009).
    1. Pons P. & Latapy M. Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10, 191–218 (2006).
    1. Wu F.-Y. The potts model. Rev. Mod. Phys. 54, 235 (1982).
    1. Loh P. L. & Wainwright M. J. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. Ann. Stat. 41, 3022–3049 (2013).
    1. Friedman J., Hastie T. & Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1 (2010).
    1. Epskamp S., Cramer A. O. J., Waldorp L. J., Schmittmann V. D. & Borsboom D. qgraph: Network visualizations of relationships in psychometric data. J. Stat. Softw. 48, 1–18 (2012).
    1. Hastings W. K. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 97–109 (1970).
    1. Epskamp S. IsingSampler: Sampling methods and distribution functions for the Ising model (2013). URL . R package version 0.1.
    1. Murray I. Advances in Markov chain Monte Carlo methods. PhD thesis, Gatsby Computational Neuroscience Unit, University College London (2007).
    1. Fruchterman T. M. & Reingold E. M. Graph drawing by force-directed placement. Software Pract. Exper. 21, 1129–1164 (1991).

Source: PubMed

3
Abonnere