Momentary subjective well-being depends on learning and not reward

Bastien Blain, Robb B Rutledge, Bastien Blain, Robb B Rutledge

Abstract

Subjective well-being or happiness is often associated with wealth. Recent studies suggest that momentary happiness is associated with reward prediction error, the difference between experienced and predicted reward, a key component of adaptive behaviour. We tested subjects in a reinforcement learning task in which reward size and probability were uncorrelated, allowing us to dissociate between the contributions of reward and learning to happiness. Using computational modelling, we found convergent evidence across stable and volatile learning tasks that happiness, like behaviour, is sensitive to learning-relevant variables (i.e. probability prediction error). Unlike behaviour, happiness is not sensitive to learning-irrelevant variables (i.e. reward prediction error). Increasing volatility reduces how many past trials influence behaviour but not happiness. Finally, depressive symptoms reduce happiness more in volatile than stable environments. Our results suggest that how we learn about our world may be more important for how we feel than the rewards we actually receive.

Keywords: decision; dopamine; happiness; human; learning; neuroscience; prediction error; subjective well-being.

Conflict of interest statement

BB, RR No competing interests declared

© 2020, Blain and Rutledge.

Figures

Figure 1.. Experimental design.
Figure 1.. Experimental design.
Subjects (n = 75) performed a one-armed bandit reinforcement learning task, choosing repeatedly between two cars. They were instructed to maximise their cumulative points. In the stable task (80 trials), the probability to win for the best car was 80%. In the volatile task (80 trials), reward probabilities switched between 80% for one car and 80% for the other car every 20 trials. Task order was counterbalanced across subjects (see Materials and methods). The reward available for each car was randomly determined on each trial and unrelated to the probability of winning. Every three to four trials, subjects were asked to report ‘How happy are you right now?’ by moving a cursor on a line. Each trial started with a fixation symbol in the centre of the screen. Then, the stimuli were displayed but choice was not permitted. The potential reward for each car was then displayed and participants were free to choose an option without any time constraints. The chosen option was outlined by a yellow frame. Finally, the outcome was displayed. Both the car and the reward magnitude frames were green if the chosen car won the race (example shown). The car frame was red and crossed out if the chosen car lost (example shown in inset).
Figure 2.. Learning rate adapts to environmental…
Figure 2.. Learning rate adapts to environmental volatility.
(A) Participants chose the option with the highest expected value 82% of the time in the stable environment (blue curve, left panel) and 73% of the time in the volatile environment (orange curve, right panel). The additive model containing three parameters (a learning rate determining the sensitivity to prediction error, an inverse temperature reflecting choice stochasticity, and a relative weight for probability and reward magnitude in choice) fitted choice data well (black dashed lines) in the stable environment (mean pseudo-r2 = 0.62) and the volatile environment (mean pseudo-r2 = 0.45). (B) Participants chose more often the option with the higher probability in the stable environment compared to the volatile environment. Critically, participants stayed on the same option more often if choosing that option resulted in the car winning (light orange) compared to the car losing (dark orange) in the volatile environment compared to the stable environment (light blue and dark blue represent staying after winning and losing, respectively). This suggests that participant behaviour was more sensitive to feedback in the volatile than stable environment, as an agent with a higher learning rate would be. Additive model predictions show a similar difference in feedback sensitivity across environments (purple). (C) Learning rates were higher in the volatile environment (orange) compared to the stable environment (blue). This was true for participants completed the stable learning task before (stable 1) or after (stable 2) the volatile learning task. Error bars represent SEM. *p < 0.05, ***p < 0.001.
Figure 3.. Happiness is associated with probability…
Figure 3.. Happiness is associated with probability and probability prediction error.
(A) Most participants were happier when their chosen car won compared to when their chosen car lost (97% of participants in the stable environment, 96% in the volatile environment, in the left and right panel, respectively). (B) Momentary happiness was best explained by a model (black dotted lines) including both the chosen probability estimate and the probability prediction error (PPE) derived from the additive choice model in addition to a forgetting factor and a baseline mood parameter, for both the stable (mean r2 = 0.58) and the volatile (mean r2 = 0.62) environments. Happiness ratings were z-scored for individual participants before model fitting. The shaded areas represent SEM. (C) The chosen probability (denoted P) and the PPE parameters were significantly different from 0 for both environments. Both variables are significantly associated with changes in affective state over time. PPE weight was significantly higher than P weight in both the stable and volatile environments. See Figure 3—figure supplement 1 related to the win loss model parameters.
Figure 3—figure supplement 1.. Loss weights on…
Figure 3—figure supplement 1.. Loss weights on happiness are greater than win weights.
The win and the loss parameters were significantly different from 0 for both environments. The weight for loss was higher than the weight for win. Both variables are significantly associated with changes in affective state over time.
Figure 4.. Happiness is more strongly associated…
Figure 4.. Happiness is more strongly associated with learning than choice.
(A) Comparison between the r2 for the happiness model including a PPE term (denoted PPE^) estimated in the additive choice model (y axis) and the r2 for the happiness model including an RPE term instead (denoted RPE^). Both models had the same number of parameters. The PPE^ model accounted for more variance in mood ratings on average in both stable (blue) and volatile (orange) learning tasks. Dots above the dashed line correspond to subjects for whom more variance in happiness is explained by the PPE^ compared to the RPE^ model. (B) The PPE^ model including the chosen estimated probability (denoted P^ and estimated from the additive choice model) better explained happiness ratings than a PPE^ model including expected value (denoted EV^) for both the stable (blue) and volatile (orange) environments with both models having the same number of parameters. Dots above the dashed line correspond to subjects where more variance in happiness is explained by the P^+PPE^ compared to the EV^+PPE^ model. See Figure 4—figure supplement 1 for the estimated model frequency or each model and Table 2 for other model comparison metrics.
Figure 4—figure supplement 1.. Estimated model frequency.
Figure 4—figure supplement 1.. Estimated model frequency.
Error bars correspond to the estimated standard deviation. Exceedance probability (A) stable: EPPP̂E = 0.87, EPPPEmodels = 1.0, volatile: EPPP̂E = 0.81, EPPPEmodels = 1.0; (B) stable: EPP̂+PP̂E = 0.97, volatile: EPP̂+PP̂E = 1.0; (C) stable: EPP̂+PP̂E = 1.0, volatile: EPP̂+PP̂E = 0.48.
Figure 5.. Forgetting factors are consistent across…
Figure 5.. Forgetting factors are consistent across stable and volatile learning tasks.
(A) Weights for PPEs in determining happiness were consistent across environments. (B) The happiness forgetting factor did not change between stable (blue) and volatile (orange) environments, regardless of testing order. See Figure 5—figure supplement 1 for an analysis without any assumption regarding the shape of the influence decay. (C) Happiness forgetting factors were consistent across environments. Error bars represent SEM. ***p < 0.001.
Figure 5—figure supplement 1.. Happiness is influenced…
Figure 5—figure supplement 1.. Happiness is influenced by multiple past probability prediction errors.
Each bar corresponds to the influence of the past trials on the current happiness rating in the stable (blue) and volatile (orange) environments. (A) The left panel shows the influence of probability prediction error estimated from the additive learning model (PPE^). (B) The right panel shows the influence of the objective probability prediction error (PPE). Error bars represent SEM. *p < 0.05, **p < 0.01, ***p < 0.001.
Figure 6.. Baseline mood decreases with depressive…
Figure 6.. Baseline mood decreases with depressive symptoms in volatile environments.
(A) Average happiness was not correlated with depressive symptoms (PHQ) in the stable task (left panel, blue) but decreased with depressive symptoms in the volatile task (middle panel, orange). The difference in happiness between stable and volatile environments was also significantly related to depression (right panel). (B) Baseline mood parameters estimated with non-z-scored happiness ratings showed the same relationship to depressive symptoms as average happiness with lower parameters in volatile than stable environments. *p < 0.05, **p < 0.01.

References

    1. Barto AG. Adaptive critics and the basal ganglia. In: Houk J. C, Davis J. L, editors. Models of Information Processing in the Basal Ganglia, Computational Neuroscience. The MIT Press; 1995. pp. 215–232.
    1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020.
    1. Beck AT, Steer RA, Ball R, Ranieri W. Comparison of beck depression inventories -IA and -II in psychiatric outpatients. Journal of Personality Assessment. 1996;67:588–597. doi: 10.1207/s15327752jpa6703_13.
    1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954.
    1. Berridge KC. From prediction error to incentive salience: mesolimbic computation of reward motivation. European Journal of Neuroscience. 2012;35:1124–1143. doi: 10.1111/j.1460-9568.2012.07990.x.
    1. Blain B. MSWB_LearningNotReward. swh:1:rev:b7c4a0cd761dcf249c72caf809dd81af24c4a49bSoftware Heritage. 2020
    1. Blanco NJ, Otto AR, Maddox WT, Beevers CG, Love BC. The influence of depression symptoms on exploratory decision-making. Cognition. 2013;129:563–568. doi: 10.1016/j.cognition.2013.08.018.
    1. Bromberg-Martin ES, Hikosaka O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron. 2009;63:119–126. doi: 10.1016/j.neuron.2009.06.009.
    1. Browning M, Behrens TE, Jocham G, O'Reilly JX, Bishop SJ. Anxious individuals have difficulty learning the causal statistics of aversive environments. Nature Neuroscience. 2015;18:590–596. doi: 10.1038/nn.3961.
    1. Brydevall M, Bennett D, Murawski C, Bode S. The neural encoding of information prediction errors during non-instrumental information seeking. Scientific Reports. 2018;8:1–11. doi: 10.1038/s41598-018-24566-x.
    1. Burnham KP, Anderson DR. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. Springer; 2004.
    1. Cella M, Dymond S, Cooper A. Impaired flexible decision-making in major depressive disorder. Journal of Affective Disorders. 2010;124:207–210. doi: 10.1016/j.jad.2009.11.013.
    1. Charpentier CJ, Bromberg-Martin ES, Sharot T. Valuation of knowledge and ignorance in mesolimbic reward circuitry. PNAS. 2018;115:E7255–E7264. doi: 10.1073/pnas.1800547115.
    1. Chase HW, Frank MJ, Michael A, Bullmore ET, Sahakian BJ, Robbins TW. Approach and avoidance learning in patients with major depression and healthy controls: relation to anhedonia. Psychological Medicine. 2010;40:433–440. doi: 10.1017/S0033291709990468.
    1. Coddington LT, Dudman JT. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nature Neuroscience. 2018;21:1563–1573. doi: 10.1038/s41593-018-0245-7.
    1. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012;482:85–88. doi: 10.1038/nature10754.
    1. Daunizeau J, Adam V, Rigoux L. VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLOS Computational Biology. 2014;10:e1003441. doi: 10.1371/journal.pcbi.1003441.
    1. de Berker AO, Rutledge RB, Mathys C, Marshall L, Cross GF, Dolan RJ, Bestmann S. Computations of uncertainty mediate acute stress responses in humans. Nature Communications. 2016;7:10996. doi: 10.1038/ncomms10996.
    1. Donahue CH, Lee D. Dynamic routing of task-relevant signals for decision making in dorsolateral prefrontal cortex. Nature Neuroscience. 2015;18:295–301. doi: 10.1038/nn.3918.
    1. Eldar E, Rutledge RB, Dolan RJ, Niv Y. Mood as representation of momentum. Trends in Cognitive Sciences. 2016;20:15–24. doi: 10.1016/j.tics.2015.07.010.
    1. Eldar E, Niv Y. Interaction between emotional state and learning underlies mood instability. Nature Communications. 2015;6:6149. doi: 10.1038/ncomms7149.
    1. Farashahi S, Donahue CH, Khorsand P, Seo H, Lee D, Soltani A. Metaplasticity as a neural substrate for adaptive learning and choice under uncertainty. Neuron. 2017;94:401–414. doi: 10.1016/j.neuron.2017.03.044.
    1. Farashahi S, Donahue CH, Hayden BY, Lee D, Soltani A. Flexible combination of reward information across primates. Nature Human Behaviour. 2019;3:1215–1224. doi: 10.1038/s41562-019-0714-3.
    1. Fredrickson BL. The broaden-and-build theory of positive emotions. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences. 2004;359:1367–1377. doi: 10.1098/rstb.2004.1512.
    1. Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife. 2016;5:e11305. doi: 10.7554/eLife.11305.
    1. Gruber MJ, Ranganath C. How curiosity enhances Hippocampus-Dependent memory: the prediction, appraisal, curiosity, and exploration (PACE) Framework. Trends in Cognitive Sciences. 2019;23:1014–1025. doi: 10.1016/j.tics.2019.10.003.
    1. Hart AS, Rutledge RB, Glimcher PW, Phillips PE. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. The Journal of Neuroscience. 2014;34:698–704. doi: 10.1523/JNEUROSCI.2489-13.2014.
    1. Herzallah MM, Moustafa AA, Natsheh JY, Abdellatif SM, Taha MB, Tayem YI, Sehwail MA, Amleh I, Petrides G, Myers CE, Gluck MA. Learning from negative feedback in patients with major depressive disorder is attenuated by SSRI antidepressants. Frontiers in Integrative Neuroscience. 2013;7:67. doi: 10.3389/fnint.2013.00067.
    1. Huys QJ, Pizzagalli DA, Bogdan R, Dayan P. Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis. Biology of Mood & Anxiety Disorders. 2013;3:12. doi: 10.1186/2045-5380-3-12.
    1. Kahneman D, Tversky A. Prospect theory: an analysis of decision under risk. Econometrica. 1979;47:263–391. doi: 10.2307/1914185.
    1. Koechlin E. Human Decision-Making beyond the rational decision theory. Trends in Cognitive Sciences. 2020;24:4–6. doi: 10.1016/j.tics.2019.11.001.
    1. Kreft IG, De Leeuw J. Introducing Multilevel Modeling. Sage; 1998.
    1. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine. 2001;16:606–613. doi: 10.1046/j.1525-1497.2001.016009606.x.
    1. Kunisato Y, Okamoto Y, Ueda K, Onoda K, Okada G, Yoshimura S, Suzuki S, Samejima K, Yamawaki S. Effects of depression on reward-based decision making and variability of action in probabilistic learning. Journal of Behavior Therapy and Experimental Psychiatry. 2012;43:1088–1094. doi: 10.1016/j.jbtep.2012.05.007.
    1. Massi B, Donahue CH, Lee D. Volatility facilitates value updating in the prefrontal cortex. Neuron. 2018;99:598–608. doi: 10.1016/j.neuron.2018.06.033.
    1. Mathys C, Daunizeau J, Friston KJ, Stephan KE. A Bayesian foundation for individual learning under uncertainty. Frontiers in Human Neuroscience. 2011;5:39. doi: 10.3389/fnhum.2011.00039.
    1. Mellers BA, Schwartz A, Ho K, Ritov I. Decision affect theory: emotional reactions to the outcomes of risky options. Psychological Science. 1997;8:423–429. doi: 10.1111/j.1467-9280.1997.tb00455.x.
    1. Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive hebbian learning. The Journal of Neuroscience. 1996;16:1936–1947. doi: 10.1523/JNEUROSCI.16-05-01936.1996.
    1. Mueller EM, Pechtel P, Cohen AL, Douglas SR, Pizzagalli DA. Potentiated processing of negative feedback in depression is attenuated by anhedonia. Depression and Anxiety. 2015;32:296–305. doi: 10.1002/da.22338.
    1. Otto AR, Fleming SM, Glimcher PW. Unexpected but incidental positive outcomes predict Real-World gambling. Psychological Science. 2016;27:299–311. doi: 10.1177/0956797615618366.
    1. Pechtel P, Dutra SJ, Goetz EL, Pizzagalli DA. Blunted reward responsiveness in remitted depression. Journal of Psychiatric Research. 2013;47:1864–1869. doi: 10.1016/j.jpsychires.2013.08.011.
    1. Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006;442:1042–1045. doi: 10.1038/nature05051.
    1. Pulcu E, Browning M. The misestimation of uncertainty in affective disorders. Trends in Cognitive Sciences. 2019;23:865–875. doi: 10.1016/j.tics.2019.07.007.
    1. Robinson OJ, Cools R, Carlisi CO, Sahakian BJ, Drevets WC. Ventral striatum response during reward and punishment reversal learning in unmedicated major depressive disorder. American Journal of Psychiatry. 2012;169:152–159. doi: 10.1176/appi.ajp.2011.11010137.
    1. Rouault M, Drugowitsch J, Koechlin E. Prefrontal mechanisms combining rewards and beliefs in human decision-making. Nature Communications. 2019;10:1–16. doi: 10.1038/s41467-018-08121-w.
    1. Rutledge RB, Skandali N, Dayan P, Dolan RJ. A computational and neural model of momentary subjective well-being. PNAS. 2014;111:12252–12257. doi: 10.1073/pnas.1407535111.
    1. Rutledge RB, Skandali N, Dayan P, Dolan RJ. Dopaminergic modulation of decision making and subjective Well-Being. Journal of Neuroscience. 2015;35:9811–9822. doi: 10.1523/JNEUROSCI.0702-15.2015.
    1. Rutledge RB, Moutoussis M, Smittenaar P, Zeidman P, Taylor T, Hrynkiewicz L, Lam J, Skandali N, Siegel JZ, Ousdal OT, Prabhu G, Dayan P, Fonagy P, Dolan RJ. Association of neural and emotional impacts of reward prediction errors with major depression. JAMA Psychiatry. 2017;74:790–797. doi: 10.1001/jamapsychiatry.2017.1713.
    1. Scholl J, Klein-Flügge M. Understanding psychiatric disorder by capturing ecologically relevant features of learning and decision-making. Behavioural Brain Research. 2018;355:56–75. doi: 10.1016/j.bbr.2017.09.050.
    1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593.
    1. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. doi: 10.1214/aos/1176344136.
    1. Shepperd JA, Mcnulty JK. The affective consequences of expected and unexpected outcomes. Psychological Science. 2002;13:85–88. doi: 10.1111/1467-9280.00416.
    1. Smith KS, Berridge KC, Aldridge JW. Disentangling pleasure from incentive salience and learning signals in brain reward circuitry. PNAS. 2011;108:E255–E264. doi: 10.1073/pnas.1101920108.
    1. Spielberger CD. State Trait Anxiety Inventory for Adults: Sampler Set: Manual, Test, Scoring Key, the Corsini Encyclopedia of Psychology. Mind Garden; 1983.
    1. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ. Bayesian model selection for group studies. NeuroImage. 2009;46:1004–1017. doi: 10.1016/j.neuroimage.2009.03.025.
    1. Taylor Tavares JV, Clark L, Furey ML, Williams GB, Sahakian BJ, Drevets WC. Neural basis of abnormal response to negative feedback in unmedicated mood disorders. NeuroImage. 2008;42:1118–1126. doi: 10.1016/j.neuroimage.2008.05.049.
    1. Thoma P, Norra C, Juckel G, Suchan B, Bellebaum C. Performance monitoring and empathy during active and observational learning in patients with major depression. Biological Psychology. 2015;109:222–231. doi: 10.1016/j.biopsycho.2015.06.002.
    1. Villano WJ, Otto AR, Ezie CEC, Gillis R, Heller AS. Temporal dynamics of real-world emotion are more strongly linked to prediction error than outcome. Journal of Experimental Psychology: General. 2020;149:1755–1766. doi: 10.1037/xge0000740.
    1. Vrieze E, Pizzagalli DA, Demyttenaere K, Hompes T, Sienaert P, de Boer P, Schmidt M, Claes S. Reduced reward learning predicts outcome in major depressive disorder. Biological Psychiatry. 2013;73:639–645. doi: 10.1016/j.biopsych.2012.10.014.
    1. Wilcox RR, Tian T. Comparing dependent correlations. The Journal of General Psychology. 2008;135:105–112. doi: 10.3200/GENP.135.1.105-112.
    1. Zhang J, Berridge KC, Tindell AJ, Smith KS, Aldridge JW. A neural computational model of incentive salience. PLOS Computational Biology. 2009;5:e1000437. doi: 10.1371/journal.pcbi.1000437.

Source: PubMed

3
Subscribe