Affective bias as a rational response to the statistics of rewards and punishments

Erdem Pulcu, Michael Browning, Erdem Pulcu, Michael Browning

Abstract

Affective bias, the tendency to differentially prioritise the processing of negative relative to positive events, is commonly observed in clinical and non-clinical populations. However, why such biases develop is not known. Using a computational framework, we investigated whether affective biases may reflect individuals' estimates of the information content of negative relative to positive events. During a reinforcement learning task, the information content of positive and negative outcomes was manipulated independently by varying the volatility of their occurrence. Human participants altered the learning rates used for the outcomes selectively, preferentially learning from the most informative. This behaviour was associated with activity of the central norepinephrine system, estimated using pupilometry, for loss outcomes. Humans maintain independent estimates of the information content of distinct positive and negative outcomes which may bias their processing of affective events. Normalising affective biases using computationally inspired interventions may represent a novel approach to treatment development.

Keywords: computational Modelling; depression; human; learning; neuroscience; norepinepherine; pupilometry.

Conflict of interest statement

No competing interests declared.

Received travel expenses from Lundbeck for attending conferences.

Figures

**Figure 1.. Task structure.**
(A) Timeline of one trial from the learning task used in this study. Participants were presented with two shapes (referred to as shape ‘A’ and ‘B’) and had to choose one. On each trial, one of the two shapes was be associated with a ‘win’ outcome (resulting in a win of 15 p) and one with a ‘loss’ outcome (resulting in a loss of 15 p). The two outcomes were independent, that is knowledge of the location of the win provided no information about the location of the loss (see description of panel C below). Using trial and error participants had to learn where the win and loss were likely to be found and use this information to guide their choice in order to maximise their monetary earnings. (B) Overall task structure. The task consisted of 3 blocks of 80 trials each (i.e. vertical, dashed, dark lines separate the blocks). The y-axis represents the probability, p, that an outcome (win in solid green or loss in dashed red) will be found under shape ‘A’ (the probability that it is under shape ‘B’ is 1-p). The blocks differed in how volatile (changeable) the outcome probabilities were. Within the first block both win and loss outcomes were volatile, in the second two blocks one outcome was volatile and the other stable (here wins are stable in the second block and losses stable in the third block). The volatility of the outcome influences how informative that outcome is. Consider the second block in which the losses are volatile and the wins stable. Here, regardless of whether the win is found under shape ‘A’ or shape ‘B’ on a trial, it will have the same chance of being under each shape in the following trials, so the position of a win in this block provides little information about the outcome of future trials. In contrast, if a loss is found under shape ‘A’, it is more likely to occur under this shape in future trials than if it is found under shape ‘B’. Thus, for the second block losses provide more information than wins and participants are expected to learn more from them. (C) The four potential outcomes from a trial. Win and loss outcomes were independent, and so participants had to separately estimate where the win and where the loss would be on each trial in order to complete the task. This manipulation made it possible to independently manipulate the volatility of the two outcomes.

Figure 2.. Effect of the Volatility Manipulation… — **Figure 2.. Effect of the Volatility Manipulation on Participant Behaviour.**
(A) Mean (SEM) learning rates for each block of the learning task. As can be seen the win learning rates (light green bars) and loss learning rate (dark red bars) varied independently as a function of the volatility of the relevant outcome F(1,28) =27.97, p<0.001, with a higher learning rate being used when the outcome was volatile than stable (*p<0.05, ***p<0.001 for pairwise comparisons). (B) No effect of volatility was observed for the inverse temperature parameters (F(1,28) =0.01, p=0.92). Source data available as Figure 2—source data 1. See Figure 2—figure supplement 1 for an analysis of this behavioural effect which does not rely on formal modelling and Figure 2—figure supplement 2 for an additional task which examines the behavioural effect of expected uncertainty.

Figure 2—figure supplement 1.. Analysis of switching… — **Figure 2—figure supplement 1.. Analysis of switching behaviour in the learning task.**
The learning task includes both positive and negative outcomes which are independent of each other. As a result the task contains trials in which both positive and negative outcome encourage the same behaviour in future trials (e.g. when the win is associated with shape A and the loss with shape B, both outcomes encourage selection of shape A in the following trial) as well as trials in which the positive and negative outcomes act in opposition (e.g. when both outcomes are associated with shape A, then the win outcome encourages selection of shape A in the next trial and the loss outcome encourages selection of shape B). This second type of trial provides a simple and sensitive means of assessing how the volatility manipulations alters the impact of win and loss outcomes on choice behaviour in the task blocks. Specifically an increased influence of win outcomes (e.g. when wins are volatile) should lead to: (a) A decreased tendency to change (shift) choice when both win and loss outcomes are associated with the chosen shape in the current trial and (b) An increased tendency to change (shift) choice when both win and loss outcomes are associated with the unchosen shape in the current trial. This analysis does not depend on any formal model and thus can be used to complement the model based analysis reported in the main paper. We calculated the proportion of shift trials separately for trials in which both outcomes were associated with the chosen (‘both’) or unchosen (‘nothing’) shape for each of the three blocks (dark columns = both informative; grey columns = losses informative; white columns = wins informative). Consistent with the model based analysis, there was a significant interaction between trial type and block (F(1,28)=10.52, p=0.003). Participants switched significantly less frequently when both outcomes were associated with the chosen option in the win relative to loss informative blocks (F(1,28)=6.1, p=0.02) and switched significantly more frequently when both outcomes were associated with the unchosen option in the win relative to loss informative blocks (F(1,28)=4.69, p=0.04). This indicates that the results reported in the main paper are unlikely to be dependent on the exact form of the behavioural model used to derive the learning rate parameter. Bars represent mean (SEM) probability of switching choice in subsequent trial. *=p < 0.05 for comparison between win informative and loss informative blocks.

**Figure 2—figure supplement 2.. Magnitude task.**
When learning, a number of different forms of uncertainty can influence behaviour. One form, which is sometimes called ‘unexpected uncertainty’ (Yu and Dayan, 2005) is caused by changes in the associations being learned (i.e. volatility) and is the main focus of this paper (see main text for a description of how volatility influences learning). A second form of uncertainty, sometimes called ‘expected uncertainty’(Yu and Dayan, 2005) arises when an association between a stimulus or action and the subsequent outcome is more or less predictive. For example, this form of uncertainty is lower if an outcome occurs on 90% of the times an action is taken and higher if the outcome occurs on 50% of the time an action is taken. Normatively, expected uncertainty should influence learning rate—a less predictive association (i.e. higher expected uncertainty) leads to more random outcomes which tell us less about the underlying association we are trying to learn, so learners should employ a lower learning rate when expected uncertainty is higher. In the learning task described in this paper both the expected and unexpected uncertainty differ between blocks. Specifically, when an outcome is stable in the task it occurs on 50% of trials, whereas when it is volatile it varies between occurring on 85/15% of trials. Thus the stable outcome is, at any one time, also less predictable (i.e. noisier) than the volatile outcome. This task schedule was used as a probability of 50% for the stable outcome improves the ability of the task to accurately estimate learning rates (it allows more frequent switches in choice). Further both forms of uncertainty would be expected to reduce learning rate in the stable blocks and increase it in the volatile block of the task. However, this aspect of the task raises the possibility that the observed effects on behaviour described in the main paper may arise secondary to differences in expected uncertainty (noise) rather than the unexpected uncertainty (volatility) manipulation. In order to test this possibility we developed a similar learning task in which volatility was kept constant and expected uncertainty was varied. In this magnitude task (panel a), participants again had to choose between two shapes in order to win as much money as possible. On each trial 100 ‘win points’ (bar on top of fixation cross with green fill) and 100 ‘loss points’ (bar under fixation cross with red fill) were divided between the two shapes and participants received money proportional to the number of win points – loss points of their chosen option. Thus, a win and loss outcome occurred on every trial of this task, but the magnitude of these outcomes varied. During the task, participants had to learn the expected magnitude of wins and losses for the shapes rather than the probability of their occurrence. This design allowed us to present participants with schedules in which the volatility (i.e. unexpected uncertainty) of win and loss magnitudes was constant (three change points occurred per block) but the noise (expected uncertainty) varied (Panel b; the standard deviation of the magnitudes was 17.5 for the high noise outcomes and 5 for the low noise outcomes). Otherwise the task was structurally identical to the task reported in the paper with 240 trials split into three blocks. We recruited a separate cohort of 30 healthy participants who completed this task and then estimated their learning rate using a model which was structurally identical (i.e. two learning rates and two inverse temperature parameters) to that used in the main paper (Model 1). As can be seen (Panel c), there was no effect of expected uncertainty on participant learning rate (block information x parameter valence; F(1,28)=1.97, p=0.17) during this task. This suggests that the learning rate effect reported in the paper cannot be accounted for by differences in expected uncertainty and therefore is likely to have arisen due to the unexpected uncertainty (volatility) manipulation. Inverse decision temperature did differ between block (Panel d; F(1,28)=5.56, p=0.026). As can be seen there was a significantly higher win inverse temperature during the block in which the losses had lower noise (F(1,28)=9.26,p=0.005) and when compared to the win inverse temperature when wins had lower noise (F(1,28)=5.35,p=0.028), but no equivalent effect for loss inverse temperature. These results indicate that, if anything, participants were more influenced by noisy outcomes. Interestingly a previous study (Nassar et al., 2012) described a learning tasks in which a normative effect of outcome noise was seen (i.e. a higher learning rate was used by participants when the outcome had lower noise). The task used by Nassar and colleagues differed in a number of respects to that used here (only rewarding outcomes were received and participants had to estimate a number on a continuous scale, based on previous outcomes rather than make a binary choice) which may explain why an effect on learning rate was not observed in the current task. Regardless of the exact reason for the lack of effect of noise in the magnitude task, it suggests that the effect described in the main paper is likely to be driven by an effect of unexpected rather than expected uncertainty.

Figure 3.. Pupil response to outcome delivery… — **Figure 3.. Pupil response to outcome delivery during the learning task.**
Lines illustrate the mean pupil dilation to an outcome when it appears on the chosen relative to the unchosen shape, across the 6 s after outcomes were presented. Light green lines (with crosses and circles) report response to win outcomes, dark red lines report response to loss outcomes. Solid lines report blocks in which the wins were more informative (volatile), dashed lines blocks in which losses were more informative. As can be seen pupils dilated more when the relevant outcome was more informative, with this effect being particularly marked for loss outcomes. Shaded regions represent the SEM. Figure 3—figure supplement 1 plots the timecourses for trials in which outcomes were or were not obtained separately, and Figure 3—figure supplement 2 reports the results of a complimentary regression analysis of the pupil data.

Figure 3—figure supplement 1.. Individual time courses… — **Figure 3—figure supplement 1.. Individual time courses for trials in which wins (panel a) and losses (panel b) are either received or not received.**
Lines represent the mean and shaded areas the SEM of pupil dilation over the 6 s after outcomes are presented. Figure 3 illustrates the difference in pupil dilation between trials in which an outcome was received and those in which the outcome was not received. In order to further investigate this effect the mean pupil response for trials in which the outcome was and was not received have been separately plotted. As can be seen, whereas there is relatively little difference in pupil response during the win trials, there is a large difference in dilation between trials on which a loss is received and those in which no loss is received. Further, the effect of loss volatility is seen to both increase dilation on receipt of a loss and reduce dilation when no loss is received, suggesting that the effect of the volatility manipulation is to exaggerate the effect of the outcome.

Figure 3—figure supplement 2.. Regression analysis of… — **Figure 3—figure supplement 2.. Regression analysis of pupil data.**
The analysis of pupil data reported in the main text examines the effect of block information content (i.e. win volatile vs. loss volatile) and outcome receipt on the pupil response to win and loss outcomes. However a number of other factors may also influence pupil dilation such as the order in which the outcomes were presented and the surprise associated with the outcome (Browning et al., 2015). In order to ensure that these additional factors could not account for our findings we ran a regression analysis of the pupil data from the learning task. In this analysis we derived, for each participant, trialwise estimates of the outcome volatility and outcome surprise of the chosen option using the Ideal Bayesian Observer reported by Behrens et al. (Behrens et al., 2007). These estimates were entered as explanatory variables alongside variables coding for outcome order (i.e. win displayed first or second), outcome of the trial (outcome received or not) and an additional term coding for the interaction between the outcome volatility and outcome of the trial (i.e. analogous to the pupil effect reported in Figure 3). Separate regression analyses were run for each 2 ms timepoint across the outcome period, for win and loss outcomes and for each participant. This resulted in timeseries of beta weights representing the impact of each explanatory factor, for each participant and for win and loss outcomes. As can be seen, consistent with the results reported in the paper this analysis revealed a significant volatility x outcome interaction for loss outcomes (F(1,27)=6.249, p=0.019), with no effect for wins (F(1,27)=0.215, p=0.646). This result indicates that the pupil effects reported in the main paper are not the result of outcome order or surprise effects on pupil dilation. Lines illustrates mean (SEM) beta weight of the volatility x outcome regressors for win (green) and loss (red) outcomes.

Figure 4.. Relationship between behavioural and physiological… — **Figure 4.. Relationship between behavioural and physiological measures.**
The more an individual altered their loss learning rate between blocks, the more that individual’s pupil dilation in response to loss outcomes differed between the blocks (panel b; p=0.009), however no such relationship was observed for the win outcomes (panel a; p=0.7). Note that learning rates are transformed onto the real line using an inverse logit transform before their difference is calculated and thus the difference score may be greater than ±1. Figure 4—figure supplements 1 and 2 describe the relationship between these measures and baseline symptoms of anxiety and depression.

Figure 4—figure supplement 1.. Relationship between symptom… — **Figure 4—figure supplement 1.. Relationship between symptom scores and behavioural adaptation to volatility.**
Although participants in the current study were not selected on the basis of their symptoms of depression or anxiety, baseline questionnaires were completed allowing assessment of the relationship between symptoms and task performance. A correlation was found between trait-STAI and the change in learning rate to losses, with participants with higher scores adjusting their learning rate less than those with a lower score (Panel b; r = −0.36, p=0.048). This is the same effect reported by Browning et al., 2015. We did not observe any relationship between either questionnaire measure and change in the win learning rate or between QIDS score and change in loss learning rate (Panels a, c, d; all p>0.19).

Figure 4—figure supplement 2.. Relationship between symptom… — **Figure 4—figure supplement 2.. Relationship between symptom scores and pupillary adaptation to volatility.**
Consistent with previous work (Browning et al., 2015) symptoms of anxiety, measured using the trait-STAI and depression, measured using the QIDS, correlated significantly negatively with differential pupil response to losses (Panels c, d; all r < −0.43, all p<0.02). That is, the higher the symptom score, the less pupil dilation differed between the loss informative and loss non-informative blocks. These measures did not correlate with pupil response to wins (Panels a, c; all p>0.19).

Figure 5.. BIC Scores for Comparator Models… — **Figure 5.. BIC Scores for Comparator Models (see table S1 for model descriptions).**
Smaller BIC scores indicate a better model fit. BIC scores were calculated as the sum across all three task blocks. Bars represent mean (SEM) of the scores across participants.

References

1. Aston-Jones G, Cohen JD. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annual Review of Neuroscience. 2005;28:403–450. doi: 10.1146/annurev.neuro.28.061604.135709.
1. Behrens TE, Hunt LT, Woolrich MW, Rushworth MF. Associative learning of social value. Nature. 2008;456:245–249. doi: 10.1038/nature07538.
1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954.
1. Bradley BP, Mogg K, Williams R. Implicit and explicit memory for emotion-congruent information in clinical depression and anxiety. Behaviour Research and Therapy. 1995;33:755–770. doi: 10.1016/0005-7967(95)00029-W.
1. Browning M, Behrens TE, Jocham G, O'Reilly JX, Bishop SJ. Anxious individuals have difficulty learning the causal statistics of aversive environments. Nature Neuroscience. 2015;18:590–596. doi: 10.1038/nn.3961.
1. Browning M, Holmes EA, Charles M, Cowen PJ, Harmer CJ. Using attentional bias modification as a cognitive vaccine against depression. Biological Psychiatry. 2012;72:572–579. doi: 10.1016/j.biopsych.2012.04.014.
1. Ciociola AA, Cohen LB, Kulkarni P, FDA-Related Matters Committee of the American College of Gastroenterology How drugs are developed and approved by the FDA: current process and future directions. The American Journal of Gastroenterology. 2014;109:620–623. doi: 10.1038/ajg.2013.407.
1. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans' choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027.
1. Eshel N, Roiser JP. Reward and punishment processing in depression. Biological Psychiatry. 2010;68:118–124. doi: 10.1016/j.biopsych.2010.01.027.
1. Gotlib IH, Krasnoperova E, Yue DN, Joormann J. Attentional biases for negative interpersonal stimuli in clinical depression. Journal of Abnormal Psychology. 2004;113:127–135. doi: 10.1037/0021-843X.113.1.121.
1. Jepma M, Murphy PR, Nassar MR, Rangel-Gomez M, Meeter M, Nieuwenhuis S. Catecholaminergic regulation of learning rate in a dynamic environment. PLoS Computational Biology. 2016;12:e1005171. doi: 10.1371/journal.pcbi.1005171.
1. Joshi S, Li Y, Kalwani RM, Gold JI. Relationships between pupil diameter and neuronal activity in the locus coeruleus, colliculi, and cingulate cortex. Neuron. 2016;89:221–234. doi: 10.1016/j.neuron.2015.11.028.
1. MacKay DJ. Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press; 2003.
1. Mathews A, MacLeod C. Cognitive vulnerability to emotional disorders. Annual Review of Clinical Psychology. 2005;1:167–195. doi: 10.1146/annurev.clinpsy.1.102803.143916.
1. Nassar MR, Rumsey KM, Wilson RC, Parikh K, Heasly B, Gold JI. Rational regulation of learning dynamics by pupil-linked arousal systems. Nature Neuroscience. 2012;15:1040–1046. doi: 10.1038/nn.3130.
1. NICE . Treatment and management of depression in adults, including adults with a chronic physical health problem. London: NICE; 2009.
1. Prelec D. The probability weighting function. Econometrica. 1998;66:497–527. doi: 10.2307/2998573.
1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black A. H, Prokasy W. F, editors. Classiacal Conditioning II: Current Research and Theory. New York: Appleton-Centuary-Crofts; 1972. pp. 64–99.
1. Rush AJ, Trivedi MH, Ibrahim HM, Carmody TJ, Arnow B, Klein DN, Markowitz JC, Ninan PT, Kornstein S, Manber R, Thase ME, Kocsis JH, Keller MB. The 16-item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biological Psychiatry. 2003;54:573–583. doi: 10.1016/S0006-3223(02)01866-8.
1. Sharot T, Garrett N. Forming beliefs: why valence matters. Trends in Cognitive Sciences. 2016;20:25–33. doi: 10.1016/j.tics.2015.11.002.
1. Spielberger CD, Gorsuch RL, Lushene RD. Manual for the State-Trait Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press; 1983.
1. Sutton R, Barto AG. Reinforcement Learning. Cambridge, Massachusetts: MIT Press; 1998.
1. Tversky A, Kahneman D. Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty. 1992;5:297–323. doi: 10.1007/BF00122574.
1. Yechiam E, Telpaz A. To take risk is to face loss: a tonic pupillometry study. Frontiers in Psychology. 2011;2:344. doi: 10.3389/fpsyg.2011.00344.
1. Yu AJ, Dayan P. Uncertainty, neuromodulation, and attention. Neuron. 2005;46:681–692. doi: 10.1016/j.neuron.2005.04.026.

Source: PubMed

Affective bias as a rational response to the statistics of rewards and punishments

Abstract

Conflict of interest statement

Figures

References

Sponsorzy i współpracownicy

Warunki medyczne

Interwencje lekowe