Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees

Quentin J M Huys, Neir Eshel, Elizabeth O'Nions, Luke Sheridan, Peter Dayan, Jonathan P Roiser, Quentin J M Huys, Neir Eshel, Elizabeth O'Nions, Luke Sheridan, Peter Dayan, Jonathan P Roiser

Abstract

When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1. Decision tree.
Figure 1. Decision tree.
A: A typical decision tree. A sequence of choices between ‘U’ (left, green) and ‘I’ (right, orange) is made to maximize the total amount earned over the entire sequence of choices. Two sequences yield the maximal total outcome of −20 (three times U; or I then twice U). Finding the optimal choice in a goal-directed manner requires evaluating all 8 sequences of three moves each. B: Pruning a decision tree at the large negative outcome. In this simple case, pruning would still favour one of the two optimal sequences (yielding −20), yet cut the computational cost by nearly half.
Figure 2. Task description.
Figure 2. Task description.
A: Task as seen by subjects. Subjects used two buttons on the keyboard (‘U’ and ‘I’) to navigate between six environmental states, depicted as boxes on a computer screen. From each state, subjects could move to exactly two other states. Each of these was associated with a particular reinforcement. The current state was highlighted in white, and the required sequence length displayed centrally. Reinforcements available from each state were displayed symbolically below the state, e.g. for the large reward. B: Deterministic task transition matrix. Each button resulted in one of two deterministic transitions from each state. For example, if the participant began in state 6, pressing ‘U’ would lead to state 3, whereas pressing ‘I’ would lead to state 1. The transitions in red yielded large punishments. These (and only these) differed between three groups of subjects (−140, −100 or −70). Note that the decision trees in Figure 1A,B correspond to a depth 3 search starting from state 3. C–E: Effect of pruning on values of optimal choices. Each square in each panel analyses choices from one state when a certain number of choices remains to be taken. The color shows the difference in earnings between two choice sequences: the best choice sequence with pruning and the best choice sequence without pruning. In terms of net earnings, pruning is never advantageous (pruned values are never better than the optimal lookahead values); but pruning does not always result in losses (white areas). It is most disadvantageous in the −70 group, and it is never disadvantageous in the −140 group because there is always an equally good alternative choice sequence which avoids transitions through large losses.
Figure 3. Choice sequences.
Figure 3. Choice sequences.
Example decision trees of varying depth starting from states 1 or 3. The widths of the solid lines are proportional to the frequencies with which particular paths were chosen (aggregated across all subjects). Yellow backgrounds denote optimal paths (note that there can be multiple optimal paths). Colours red, black, green and blue denote transitions with reinforcements of and respectively. Dashed lines denote parts of the decision tree that were never visited. Visited states are shown in small gray numbers where space allows. A: Subjects avoid transitions through large losses. In the condition, this is not associated with an overall loss. B: In the condition, where large rewards lurk behind the losses, subjects can overcome their reluctance to transition through large losses and can follow the optimal path through an early large loss. C: However, they do this only if the tree is small and thus does not require pruning. Subjects fail to follow the optimal path through the same subtree as in B (indicated by a black box) if it occurs deeper in the tree, i.e. in a situation where computational demands are high. D,E,F Fraction of times subjects in each group chose the optimal sequence, deduced by looking all the way to the end of the tree. Green shows subjects' choices when the optimal sequence did not contain a large loss; blue shows subjects' choices when the optimal sequence did contain a large loss. Coloured areas show 95% confidence intervals, and dashed lines predictions from the model ‘Pruning & Learned’ (see below).
Figure 4. Model performance and comparison.
Figure 4. Model performance and comparison.
A: Fraction of choices predicted by the model as a function of the number of choices remaining. For bars ‘3 choices to go’, for instance, it shows the fraction of times the model assigned higher value to the subject's choice in all situations where three choices remained (i.e. bar 3 in these plots encompasses all three panels in Figure 3A–C). These are predictions only in the sense that the model predicts choice based on history up to . The gray line shows this statistic for the full look-ahead model, and the blue bars for the most parsimonious model (‘Pruning and Learned’). B: Mean predictive probabilities, i.e. likelihood afforded to choices on trial given learned values up to trial . C: Model comparison based on integrated Bayesian Information Criterion () scores. The lower the score, the more parsimonious the model fit. For guidance, some likelihood ratios are displayed explicitly, both at the group level (fixed effect) and at the individual level (random effect). Our main guide is the group-level (fixed effect). The red star indicates the most parsimonious model. D,E: Transition probability from state 6 to state 1 (which incurs a −20 loss) when a subsequent move to state 2 is possible (D; at least two moves remain) or not (E; when it is the only remaining move). Note that subjects' disadvantageous approach behavior in E (dark gray bar) is only well accommodated by a model that incorporates the extra Learned Pavlovian parameter. F: Decision tree of depth 4 from starting state 3. See Figure 3 for colour code. Subjects prefer (width of line) the optimal (yellow) path with an early transition through a large loss (red) to an equally optimal path with a late transition through a large loss. G: Phase plane analysis of specific and general pruning. Parameter values for which the left optimal yellow path in panel F is assigned a greater expected value than the right optimal path are below the blue line. Combinations that are also consistent with the notion of pruning are shown in green. The red dot shows parameters inferred for present data (c.f. Figure 6). Throughout, errorbars indicate one standard error of the mean (red) and the 95% confidence intervals (green).
Figure 5. Pruning exists above and beyond…
Figure 5. Pruning exists above and beyond any loss aversion.
A: Loss aversion model comparison scores. Red star indicates most parsimonious model. The numbers by the bars show model likelihood ratios of interest at the group level, and below them at the mean individual level. Pruning adds parsimony to the model even after accounting for loss aversion (cf. ‘Discount & Loss’ vs ‘Pruning & Loss’), while loss aversion does not increase parsimony when added to the best previous model (‘Pruning & Learned’ vs ‘Loss & Prune & Learned’). B: Separate inference of all reinforcement sensitivities from best loss aversion model. C: Absolute ratio of inferred sensitivity to maximal punishment (−70, −100 or −140) and inferred sensitivity to maximal reward (always +140). Subjects are 1.4 times more sensitive to punishments than to rewards.
Figure 6. Pruning parameters.
Figure 6. Pruning parameters.
A: Pruning parameter estimates – specific and general pruning parameters are shown separately for each group. Specific pruning exceeded general pruning across subjects, but there was no main effect of group and no interaction. The difference between parameter types was significant in all three groups, with specific exceeding general pruning for 14/15, 12/16 and 14/15 subjects in the −70, −100 and −140 groups respectively. Blue bars show specific pruning parameters () and red bars general pruning parameters (). Black dots show the estimates for each subject. Gray lines show the uncertainty (square root of second moment around the parameter) for each estimate. B: Equivalent parametrization of the most parsimonious model to infer differences between pruning and discount factors directly. For all three groups, the difference is significantly positive. C: Income lost due to pruning. On trials on which the optimal sequence led through large punishments, subjects lost more income the more counterproductive pruning was (loss in group −70loss in group −100loss in group −140). Each bar shows the total income subjects lost because they avoided transitions through large losses. Throughout, the bars show the group means, with one standard error of the mean in red and the 95% confidence interval in green.
Figure 7. Psychometric correlates.
Figure 7. Psychometric correlates.
A: Subclinical depression scores (Beck Depression Inventory, BDI, range 0–15) correlated positively with specific pruning (), and negatively with sensitivity to the reinforcers (). Each bar shows a weighted linear regression coefficient. Red error bars show one standard error of the mean estimate, and green errorbars the Bonferroni corrected 95% confidence interval. , red dot . B,C: Weighted scatter plots of psychometric scores against parameters after orthogonalization.

References

    1. Knuth D, Moore R. An Analysis of Alpha-Beta Pruning. Artif Intell. 1975;6:293–326.
    1. Bonet B, Geffner H. Proc of 16th Int Conf on Automated Planning and Scheduling. 2006; Cumbria, UK. ICAPS 2006. AAAI Press; 2006. Learning depth-first search: A unified approach to heuristic search in deterministic and non-deterministic settings, and its application to MDPs. pp. 142–151.
    1. Russell S, Norvig P. Artificial Intelligence: A modern approach. Upper Saddle River, NJ: Prentice Hall; 1995.
    1. Estes W, Skinner B. Some quantitative aspects of anxiety. J Exp Psychol. 1941;29:390–400.
    1. Tye NC, Everitt BJ, Iversen SD. 5-hydroxytryptamine and punishment. Nature. 1977;268:741–743.
    1. Bouton ME. Learning and Behavior: A Contemporary Synthesis. USA: Sinauer; 2006.
    1. Williams DR, Williams H. Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. J Exp Anal Behav. 1969;12:511–520.
    1. Dayan P, Niv Y, Seymour B, Daw ND. The misbehavior of value and the discipline of the will. Neural Netw. 2006;19:1153–1160.
    1. Bolles RC. Species-specific defense reactions and avoidance learning. Psychol Rev. 1970;77:32–48.
    1. Soubrié P. Reconciling the role of central serotonin neurons in human and animal behaviour. Behav Brain Sci. 1986;9:319–364.
    1. Boureau YL, Dayan P. Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology. 2011;36:74–97.
    1. Cools R, Roberts AC, Robbins TW. Serotoninergic regulation of emotional and behavioural control processes. Trends Cogn Sci. 2008;12:31–40.
    1. Dayan P, Huys QJM. Serotonin in affective control. Annu Rev Neurosci. 2009;32:95–126.
    1. Crockett MJ, Clark L, Robbins TW. Reconciling the role of serotonin in behavioral inhibition and aversion: acute tryptophan depletion abolishes punishment-induced inhibition in humans. J Neurosci. 2009;29:11993–11999.
    1. Robinson OJ, Cools R, Sahakian BJ. Tryptophan depletion disinhibits punishment but not reward prediction: implications for resilience. Psychopharmacology (Berl) 2011;219:599–605.
    1. Tanaka SC, Samejima K, Okada G, Ueda K, Okamoto Y, et al. Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics. Neural Netw. 2006;19:1233–1241.
    1. Dayan P, Huys QJM. Serotonin, inhibition, and negative mood. PLoS Comput Biol. 2008;4:e4.
    1. Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8:1704–1711.
    1. Watkins C, Dayan P. Q-learning. Mach Learn. 1992;8:279–292.
    1. Tom SM, Fox CR, Trepel C, Poldrack RA. The neural basis of loss aversion in decisionmaking under risk. Science. 2007;315:515–518.
    1. Pizzagalli DA, Jahn AL, O'Shea JP. Toward an objective characterization of an anhedonic phenotype: a signal-detection approach. Biol Psychiatry. 2005;57:319–327.
    1. Huys QJM. Reinforcers and control. Towards a computational ætiology of depression [Ph.D. thesis] Gatsby Computational Neuroscience Unit, UCL, University of London; 2007. [ ]
    1. Huys QJM, Vogelstein J, Dayan P. Psychiatry: Insights into depression through normative decision-making models. In: Koller D, Schuurmans D, Bengio Y, Bottou L, editors. Advances in Neural Information Processing Systems 21. MIT Press; 2009. pp. 729–736.
    1. Eshel N, Roiser JP. Reward and punishment processing in depression. Biol Psychiatry. 2010;68:118–124.
    1. Dickinson A, Balleine B. The role of learning in the operation of motivational systems. In: Gallistel R, editor. Stevens' handbook of experimental psychology, volume 3. New York: Wiley; 2002. pp. 497–534.
    1. Tversky A, Kahneman D. Loss aversion in riskless choice: A reference-dependent model. Q J Econ. 1991;106:1039.
    1. Guitart-Masip M, Talmi D, Dolan R. Conditioned associations and economic decision biases. Neuroimage. 2010;53:206–214.
    1. Cipriani A, Furukawa TA, Salanti G, Geddes JR, Higgins JP, et al. Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments meta-analysis. Lancet. 2009;373:746–758.
    1. Geddes JR, Carney SM, Davies C, Furukawa TA, Kupfer DJ, et al. Relapse prevention with antidepressant drug treatment in depressive disorders: a systematic review. Lancet. 2003;361:653–661.
    1. Caspi A, Sugden K, Moffitt TE, Taylor A, Craig IW, et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT genes. Science. 2003;301:386–89.
    1. Wankerl M, Wst S, Otte C. Current developments and controversies: does the serotonin transporter gene-linked polymorphic region (5-httlpr) modulate the association between stress and depression? Curr Opin Psychiatry. 2010;23:582–587.
    1. Ansorge MS, Zhou M, Lira A, Hen R, Gingrich JA. Early-life blockade of the 5-HT transporter alters emotional behavior in adult mice. Science. 2004;306:879–881.
    1. Roiser JP, Blackwell AD, Cools R, Clark L, Rubinsztein DC, et al. Serotonin transporter polymorphism mediates vulnerability to loss of incentive motivation following acute tryptophan depletion. Neuropsychopharmacology. 2006;31:2264–2272.
    1. Ruhé HG, Mason NS, Schene AH. Mood is indirectly related to serotonin, norepinephrine and dopamine levels in humans: a meta-analysis of monoamine depletion studies. Mol Psychiatry. 2007;12:331–359.
    1. Varnäs K, Halldin C, Hall H. Autoradiographic distribution of serotonin transporters and receptor subtypes in human brain. Hum Brain Mapp. 2004;22:246–260.
    1. Pezawas L, Meyer-Lindenberg A, Drabant EM, Verchinski BA, Munoz KE, et al. 5-HTTLPR polymorphism impacts human cingulate-amygdala interactions: a genetic susceptibility mechanism for depression. Nat Neuosci. 2005;8:828–34.
    1. Clarke HF, Dalley JW, Crofts HS, Robbins TW, Roberts AC. Cognitive inflexibility after prefrontal serotonin depletion. Science. 2004;304:878–880.
    1. Amat J, Baratta MV, Paul E, Bland ST, Watkins LR, et al. Medial prefrontal cortex determines how stressor controllability affects behavior and dorsal raphe nucleus. Nat Neurosci. 2005;8:365–71.
    1. Maier SF, Watkins LR. Stressor controllability and learned helplessness: the roles of the dorsal raphe nucleus, serotonin, and corticotropin-releasing factor. Neurosci Biobehav Rev. 2005;29:829–41.
    1. Robinson OJ, Sahakian BJ. A double dissociation in the roles of serotonin and mood in healthy subjects. Biol Psychiatry. 2009;65:89–92.
    1. Roiser JP, Blackwell AD, Cools R, Clark L, Rubinsztein DC, et al. Serotonin transporter polymorphism mediates vulnerability to loss ofincentive motivation following acute tryptophan depletion. Neuropsychopharmacology. 2006;31:2264–2272.
    1. Neumeister A, Konstantinidis A, Stastny J, Schwarz MJ, Vitouch O, et al. Association between serotonin transporter gene promoter polymorphism (5HTTLPR) and behavioral responses to tryptophan depletion in healthy women with and without family history of depression. Arch Gen Psychiatry. 2002;59:613–20.
    1. Lasa L, Ayuso-Mateos JL, Vzquez-Barquero JL, Dez-Manrique FJ, Dowrick CF. The use of the Beck Depression Inventory to screen for depression in the general population: a preliminary analysis. J Affect Disord. 2000;57:261–265.
    1. Beck A, Epstein N, Brown G, Steer R, et al. An inventory for measuring clinical anxiety: Psychometric properties. J Consult Clin Psych. 1988;56:893–897.
    1. Teasdale J. Cognitive vulnerability to persistent depression. Cognition Emotion. 1988;2:247–274.
    1. Lewinsohn PM, Allen NB, Seeley JR, Gotlib IH. First onset versus recurrence of depression: differential processes of psychosocial risk. J Abnorm Psychol. 1999;108:483–489.
    1. Kendler KS, Kessler RC, Neale MC, Heath AC, Eaves LJ. The prediction of major depression in women: toward an integrated etiologic model. Am J Psychiatry. 1993;150:1139–1148.
    1. Beats BC, Sahakian BJ, Levy R. Cognitive performance in tests sensitive to fronal lobe dysfunction in the elderly depressed. Psychol Med. 1996;26:591–603.
    1. Elliott R, Sahakian BJ, McKay AP, Herrod JJ, Robbins TW, et al. Neuropsychological impairments in unipolar depression: the role of perceived failure on subsequent performance. Psychol Med. 1996;26:975–89.
    1. Goodwin GM. Neuropsychological and neuroimaging evidence for the involvement of the frontal lobes in depression. J Psychopharmacol. 1997;11:115–122.
    1. Williams JMG, Barnhofer T, Crane C, Herman D, Raes F, et al. Autobiographical memory specificity and emotional disorder. Psychol Bull. 2007;133:122–148.
    1. Elliott R, Sahakian BJ, Herrod JJ, Robbins TW, Paykel ES. Abnormal response to negative feedback in unipolar depression: evidence for a diagnosis-specific impairment. J Neurol Neurosurg Psychiatry. 1997;63:74–82.
    1. Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, et al. The mini-international neuropsychiatric interview (m.i.n.i.): the development and validation of a structured diagnostic psychiatric interview for dsm-iv and icd-10. J Clin Psychiatry. 1998;59(Suppl 20):22–33;quiz 34–57.
    1. Spielberger C, Gorsuch R. STAI manual for the State-trait anxiety inventory (form Y) (“self-evaluation questionnaire”) Palo Alto, CA: Consult Psychol Press; 1970.
    1. Beck A, Steer R, Brown G. Manual for the Beck Depression Inventory-II. San Antonio, TX: Psychological Corporation; 1996.
    1. Costa P, McCrae R. The NEO PI-R professional manual. Odessa, Florida, USA: Psychological Assessment Resources; 1992.
    1. Wechsler D. Wechsler Test of Adult Reading Manual. San Antonio, USA: The Psychological Corporation; 2001.
    1. Wechsler D. Wechsler Adult Intelligence Scale Revised. New York, USA: The Psychological Corporation; 1981.
    1. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998.
    1. Huys QJM, Cools R, Glzer M, Friedel E, Heinz A, et al. Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Comput Biol. 2011;7:e1002028.
    1. MacKay DJ. Information theory, inference and learning algorithms. Cambridge, UK: CUP; 2003.
    1. Kass R, Raftery A. Bayes factors. J Am Stat Assoc. 1995;90:773–795.
    1. Devore JL. Probability and Statistics for Engineering and the Sciences. Duxbury Press, 4th edition; 1995.

Source: PubMed

3
Sottoscrivi