Computational psychiatry as a bridge from neuroscience to clinical applications

Quentin J M Huys, Tiago V Maia, Michael J Frank, Quentin J M Huys, Tiago V Maia, Michael J Frank

Abstract

Translating advances in neuroscience into benefits for patients with mental illness presents enormous challenges because it involves both the most complex organ, the brain, and its interaction with a similarly complex environment. Dealing with such complexities demands powerful techniques. Computational psychiatry combines multiple levels and types of computation with multiple types of data in an effort to improve understanding, prediction and treatment of mental illness. Computational psychiatry, broadly defined, encompasses two complementary approaches: data driven and theory driven. Data-driven approaches apply machine-learning methods to high-dimensional data to improve classification of disease, predict treatment outcomes or improve treatment selection. These approaches are generally agnostic as to the underlying mechanisms. Theory-driven approaches, in contrast, use models that instantiate prior knowledge of, or explicit hypotheses about, such mechanisms, possibly at multiple levels of analysis and abstraction. We review recent advances in both approaches, with an emphasis on clinical applications, and highlight the utility of combining them.

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare competing financial interests: details are available in the online version of the paper.

Figures

Figure 1
Figure 1
The blessing and curse of dimensionality. In rich data sets in psychiatry, the number of measured variables d per subject can substantially exceed the number of subjects. (a) When this occurs, subjects can always be separated linearly: up to d + 1 subjects can always be separated linearly into two classes if the data span a d-dimensional space. Three subjects can always be separated into two groups using a combination of two features. (b) For d + 2 (or more) subjects, linear separation is not always possible. (The subjects indicated by black points are not linearly separable from those indicated by red points.) (c) Such data can, however, be separated linearly if projected into a higher-dimensional space. Here, a third dimension was added to the two-dimensional data in b by calculating the absolute distance from the line through the black points, thereby making the two classes linearly separable, as shown by the gray two-dimensional plane. (d) A similar fact can be illustrated in regression: a d-order polynomial can always fit d + 1 points perfectly (red line), but it makes extreme predictions outside the range of observations and is extremely sensitive to noise, overfitting the training data. (e) Even when the features and classes are just random noise, performing regression in a high-dimensional space leads to misleadingly high performance. The panel shows receiver operating characteristic (ROC)—the false positive against the true positive rate—for logistic regression applied to such random data. The red curve shows that logistic regression performs misleadingly well on the training data, with a high area under the ROC curve (AUC) (regression training data, g). Obviously, however, this is overfitting, as the data are random. Indeed, applying the resulting regression coefficients to unseen validation data not included in the training set, the predictions are random as they should be (blue line; regression validation data, g). (f) Using LASSO, a form of cross-validated regularized regression (Box 1), partially prevents overfitting (red line; LASSO training data, g). However, because the regularization parameter is fitted to the training data, even LASSO does not fully prevent overfitting: it is only when the LASSO parameters are tested on the validation data set that performance is correctly revealed to be at chance level (blue line; LASSO validation data, g).
Figure 2
Figure 2
Exploiting and coping with high dimensionality in psychiatric data sets. Purely data-driven approaches (left and middle branches) and combinations of theory- and data-driven approaches (right branch) can be used to analyze large data sets to arrive at clinically useful applications. Dimensionality reduction is a key step to avoid overfitting. It can be performed as a preprocessing step using unsupervised methods before application of other ML techniques with or without further dimensionality reduction (left branch; Box 1); using ML techniques that automatically limit the number of variables for prediction; using regularization or Bayesian model selection (middle branch; Box 1); or using theory-driven models that in essence project the original high-dimensional data into a low-dimensional space of theoretically meaningful parameters, which can then be fed into ML algorithms that may or not further reduce dimensionality (right branch).
Figure 3
Figure 3
Using EEG measures for treatment selection in depression improves treatment response. Left, reference EEG (rEEG) procedure. After withdrawing all medications, a rEEG was performed. This was submitted for online automated analysis involving 74 biomarkers and a comparison to a large reference database of EEG measures linked to longitudinal treatment outcomes. Finally, a medication ranking was returned. Right, in a 12-site trial, patients were randomized to treatment selection via an optimized clinical protocol (based on STAR*D) or rEEG. The rEEG-based selection led to improved treatment response relative to the optimized clinical protocol after 2 weeks (red dots), and this effect grew stronger over 12 weeks. These results suggest that biological measures can improve treatment selection in depression. Adapted with permission from ref. .
Figure 4
Figure 4
Networks of symptoms. (a) Network of symptoms in DSM-IV. Two symptoms have a link if they belong to a common diagnostic category. There is a large, strongly connected cluster containing 48% of the symptoms. Overall, the network has small-world characteristics, with the average path length between two symptoms being only 2.6. Adapted with permission from ref. . (b) Autocorrelations and variance, two signs of critical slowing down, increase before a phase transition in dynamic networks. Prior to a transition from a healthy state to depression, negative emotions such as sadness show increasing variance and temporal autocorrelation. Prior to a transition from depression to a remitted state, this is observed in positive emotions, such as contentedness. Adapted with permission from ref. .
Figure 5
Figure 5
Theory-driven biophysical and RL approaches. (a) Insights into working-memory disturbances in schizophrenia. Reducing NMDA currents on inhibitory interneurons leads to overall disinhibition and broadens the bump representation of a stimulus in working memory (compare top versus bottom), making it more susceptible to distractors, especially those that activate neighboring neurons. Adapted with permission from ref. . (b) Insights into obsessive-compulsive disorder. Both lowering serotonin levels and increasing glutamatergic levels renders activity patterns excessively stable, such that when a new cluster of neurons is stimulated, activity does not shift to the new location, as would be expected (top, normal response), but rather remains ‘stuck’ in the previous location (bottom). Adapted with permission from ref. . (c) Negative symptoms in schizophrenia are related to a failure to represent expected values. In an instrumental-learning task, healthy controls and patients with schizophrenia with low levels of negative symptoms learned according to a reinforcement-learning algorithm that explicitly represents the expected value of each state-action pair (Q-learning), whereas patients with schizophrenia with high levels of negative symptoms learned according to an algorithm that learns preferences without such explicit representations (actor-critic). Adapted with permission from ref. . (d) Examining the processes that guide goal-directed evaluations. Shown is a decision tree corresponding to a sequence of three binary choices, where each choice leads to a gain or loss indicated by the numbers. A RL model was fitted to choices and contained two key parameters, representing the probability of continuing thinking when encountering a large salient loss (red arrow, −X) or when encountering other outcomes (blue arrows). (e) Subjects were far less likely to continue evaluating a branch after encountering a salient loss (red bars) than after other outcomes, for a variety of salient loss sizes. Adapted with permission from ref. .
Figure 6
Figure 6
Mechanistic models yield parameters that can be used as features to improve ML performance. A classifier trained on estimated parameters of a model fitted to simulated behavioral data (light blue curve, AUC 0.87) performed better than when trained on the raw data directly (purple curve, AUC 0.74). Data for 200 subjects with Gaussian distributed parameters were simulated from a simple MF RL model with time-varying action reinforcements. Subjects were separated into two groups based on only one parameter (the learning rate). The data set was split into two, with half of the subjects used for training a classifier and the other half for validation. Two classifiers were trained, with one trained on the raw behavioral data, and the other on the parameters estimated by fitting a RL model. The ROC curve is shown for performance on the validation set.

Source: PubMed

3
Sottoscrivi