Explicit neural signals reflecting reward uncertainty

Wolfram Schultz, Kerstin Preuschoff, Colin Camerer, Ming Hsu, Christopher D Fiorillo, Philippe N Tobler, Peter Bossaerts, Wolfram Schultz, Kerstin Preuschoff, Colin Camerer, Ming Hsu, Christopher D Fiorillo, Philippe N Tobler, Peter Bossaerts

Abstract

The acknowledged importance of uncertainty in economic decision making has stimulated the search for neural signals that could influence learning and inform decision mechanisms. Current views distinguish two forms of uncertainty, namely risk and ambiguity, depending on whether the probability distributions of outcomes are known or unknown. Behavioural neurophysiological studies on dopamine neurons revealed a risk signal, which covaried with the standard deviation or variance of the magnitude of juice rewards and occurred separately from reward value coding. Human imaging studies identified similarly distinct risk signals for monetary rewards in the striatum and orbitofrontal cortex (OFC), thus fulfilling a requirement for the mean variance approach of economic decision theory. The orbitofrontal risk signal covaried with individual risk attitudes, possibly explaining individual differences in risk perception and risky decision making. Ambiguous gambles with incomplete probabilistic information induced stronger brain signals than risky gambles in OFC and amygdala, suggesting that the brain's reward system signals the partial lack of information. The brain can use the uncertainty signals to assess the uncertainty of rewards, influence learning, modulate the value of uncertain rewards and make appropriate behavioural choices between only partly known options.

Figures

**Figure 1**
Expected reward and risk as a function of the probability of reward. Expected reward, measured as mathematical expectation of reward, increases linearly with the probability of reward p (dashed line). Expected reward is minimal at p=0 and maximal at p=1. Risk, measured as reward variance (or as its square root, standard deviation), follows an inverted U function of probability and is minimal at p=0 and 1 and maximal at p=0.5 (solid curve). Reprinted with permission from Preuschoff *et al*. (2006). Copyright © Cell Press.

**Figure 2**
Risk signal in dopamine neurons. (a) Phasic reward value signal reflecting reward prediction (left) and more sustained risk signal during the stimulus–reward interval in a single dopamine neuron. Visual stimuli predicting reward probabilities (i) 0.0, (ii) 0.25, (iii) 0.5, (iv) 0.75 and (v) 1.0 alternated semi-randomly between trials. Both rewarded and unrewarded trials are shown at intermediate probabilities; the longer vertical marks in the rasters indicate delivery of the juice reward. (b) Population histograms of responses shown in (a). Histograms were constructed from every trial in 35–44 neurons per stimulus type (638 total trials at p=0 and 1200–1700 trials for all other probabilities). Both rewarded and unrewarded trials are included at intermediate probabilities. (i) 0.0, (ii) 0.25, (iii) 0.5, (iv) 0.75 and (v) 1.0. (c) Median sustained risk-related activation of dopamine neurons as a function of reward probability. Plots show the sustained activation as inverted U function of reward probability, indicating relationship to risk as opposed to value. Data from different stimulus sets and animals are shown separately. Reprinted with permission from Fiorillo *et al*. (2003). Copyright © American Association for the Advancement of Science.

**Figure 3**
Risk signals in human ventral striatum. (a) Sustained BOLD response during 6 s correlated with variance as inverted U function of all-or-none reward probability (random effects, p<0.001; L vst, R vst for left, right ventral striatum). (b) Mean activations (parameter estimates beta with standard error) for 10 probabilities. Neural responses in striatum increased towards intermediate probabilities and decreased towards lower and higher probabilities. (i) Left vst and (ii) right vst. Dotted lines indicate best fit (r2=0.88–0.89, p<0.001). Grey data points at p=0.5 indicate late-onset activation between bet and first card when risk is maximal (p=0.5). Error bars=standard error of the mean (s.e.m). Reprinted with permission from Preuschoff *et al*. (2006). Copyright © Cell Press.

**Figure 4**
Relation of human orbitofrontal risk signals to individual risk attitude. (a, b) Risk signal in lateral OFC covarying with increasing risk aversion across participants (e.g. a ‘safety’ or ‘fear’ signal). (b) Correlation of contrast estimates of individual participants with their individual risk aversion (p<0.001, r=0.74; unpaired t-test in seven risk seekers and six risk averters). (c, d) Risk signal in medial OFC covarying with risk seeking (=inverse relation to risk aversion; e.g. a ‘risk seeking’ or ‘gambling’ signal). (d) Risk correlation analogous (r=0.85, p<0.0001) to (b). Abscissae in (b, d) show risk aversion as expressed by preference scores (−4 most risk seeking, +4 most risk aversion). To obtain these graphs, we correlated risk-related BOLD responses to individual risk attitude in two steps. First, we determined in each participant the contrast estimates reflecting the goodness of fit between brain activation and risk (variance as inverted U function of probability). Then, we regressed the contrast estimates of all participants to their individual behavioural risk preference scores and identified brain areas showing positive (a) or negative correlations (c). We plotted the regressions of risk aversion against the contrast estimates in (b, d). Reprinted with permission from Tobler *et al*. (2007). Copyright © The American Physiological Society.

**Figure 5**
Ambiguity signals in human OFC. (a) Higher BOLD responses in OFC regions to stimuli-predicting ambiguous outcomes compared with risky outcomes, as identified by random effects analysis (p<0.001, uncorrected; 10 voxels; mean from card deck, knowledge and informed opponent situations). (b) Mean time courses of orbitofrontal BOLD responses to onset of stimuli-predicting ambiguous or risky outcomes (dashed vertical lines are mean decision times; error bars=standard error of the mean, s.e.m.; n=16 participants). (i) Left OFC and (ii) right OFC. Reprinted with permission from Hsu *et al*. (2005). Copyright © American Association for the Advancement of Science.

References

1. Abel A.B. Asset prices under habit formation and catching up with the Joneses. Am. Econ. Rev. 1990;80:38–42.
1. Basso M.A, Wurtz R.H. Modulation of neuronal activity by target uncertainty. Nature. 1997;398:66–69.
1. Bechara A, Damasio A.R, Damasio H, Anderson S.W. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition. 1994;50:7–15.
1. Bechara A, Damasio H, Damasio A.R. Emotion, decision-making and the orbitofrontal cortex. Cereb. Cortex. 2000;10:295–307.
1. Bolla K.I, Eldreth D.A, Matochik J.A, Cadet J.L. Neural substrates of faulty decision-making in abstinent marijuana users. Neuroimage. 2005;26:480–492.
1. Bossaerts P, Plott C. Basic principles of asset pricing theory: evidence from large-scale experimental financial markets. Rev. Finance. 2004;8:135–169.
1. Caraco T, Martindale S, Whitham T.S. An empirical demonstration of risk-sensitive foraging preferences. Anim. Behav. 1980;28:820–830.
1. Caraco T, Blankenhorn W.U, Gregory G.M, Newman J.A, Recer G.M, Zwicker S.M. Risk-sensitivity: ambient temperaure effects foraging choice. Anim. Behav. 1990;39:338–345.
1. Cromwell H.C, Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 2003;89:2823–2838.
1. Delgado M.R, Nystrom L.E, Fissell C, Noll D.C, Fiez J.A. Tracking the hemodynamic responses to reward and punishment in the striatum. J. Neurophysiol. 2000;84:3072–3077.
1. Dunn B.D, Dalgleish T, Lawrence A.D. The somatic marker hypothesis: a critical evaluation. Neurosci. Biobehav. Rev. 2006;30:239–271.
1. Elliott R, Newman J.L, Longe O.A, Deakin J.F.W. Differential response pattern in the striatum and orbitofrontal cortex to financial rewards in humans: a parametric functional magnetic resonance imaging study. J. Neurosci. 2003;23:303–307.
1. Ellsberg D. Risk, ambiguity and the Savage axioms. Quart. J. Econ. 1961;75:643–649.
1. Ersche K.D, Fletcher P.C, Lewis S.J, Clark L, Stocks-Gee G, London M, Deakin J.B, Robbins T.W, Sahakian B.J. Abnormal frontal activations related to decision-making in current and former amphetamine and opiate dependent individuals. Psychopharmacology. 2005;180:612–623.
1. Fiorillo C.D, Tobler P.N, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902.
1. Harper D.G.C. Competitive foraging in mallards: ‘ideal free’ ducks. Anim. Behav. 1982;30:575–584.
1. Holt C.A, Laury S.K. Risk aversion and incentive effects. Am. Econ. Rev. 2002;92:1644–1655.
1. Hsu M, Bhatt M, Adolphs R, Tranel D, Camerer C.F. Neural systems responding to degrees of uncertainty in human decision-making. Science. 2005;310:1680–1683.
1. Huang C.-F, Litzenberger R.H. Prentice-Hall; Upper Saddle River, NJ: 1988. Foundations for financial economics.
1. Huettel S.A, Song A, McCarthy G. Decisions under uncertainty: probabilistic context influences activation of prefrontal and parietal cortices. J. Neurosci. 2005;25:3304–3311.
1. Huettel S.A, Stowe C.J, Gordon E.M, Warner B.T, Platt M.L. Neural signatures of economic preferences for risk and ambiguity. Neuron. 2006;49:765–775.
1. Kahneman D, Tversky A. Prospect theory: an analysis of decision under risk. Econometrica. 1979;47:263–291.
1. Knutson B, Fong G.W, Bennett S.M, Adams C.M, Hommer D. A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI. Neuroimage. 2003;18:263–272.
1. Knutson B, Taylor J, Kaufman M, Peterson R, Glover G. Distributed neural representation of expected value. J. Neurosci. 2005;25:4806–4812.
1. Levy H, Markowitz H.M. Approximating expected utility by a function of mean and variance. Am. Econ. Rev. 1979;69:308–317.
1. Livingstone M, Hubel D. Segregation of form, folor, movement, and depth: anatomy, physiology, and perception. Science. 1988;240:740–749.
1. Logothetis N.K, Pauls J, Augath M, Trinath T, Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature. 2001;412:150–157.
1. Mackintosh N.J. A theory of attention: variations in the associability of stimulus with reinforcement. Psychol. Rev. 1975;82:276–298.
1. Maia T.V, McClelland J.L. A reexamination of the evidence for the somatic marker hypothesis: what participants really know in the Iowa gambling task. Proc. Natl Acad. Sci. USA. 2004;101:16 075–16 080.
1. Markowitz H. Portfolio selection. J. Finance. 1952;7:77–91.
1. McCoy A.N, Platt M.L. Risk-sensitive neurons in macaque posterior cingulate cortex. Nat. Neurosci. 2005;8:1220–1227.
1. McCoy A.N, Crowley J.C, Haghighian G, Dean H.L, Platt M.L. Saccade reward signals in posterior cingulate cortex. Neuron. 2003;40:1031–1040.
1. McNamara J, Houston A. The application of statistical decision theory to animal behaviour. J. Theor. Biol. 1980;85:673–960.
1. Mobini S, Body S, Ho M.-Y, Bradshaw C.M, Szabadi E, Deakin J.F.W, Anderson I.M. Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology. 2002;160:290–298.
1. Musallam S, Corneil B.D, Greger B, Scherberger H, Andersen R.A. Cognitive control signals for neural prosthetics. Science. 2004;305:258–262.
1. Padoa-Schioppa C, Assad J.A. Neurons in the orbitofrontal cortex encode economic value. Nature. 2006;441:223–226.
1. Paton J.J, Belova M.A, Morrison S.E, Salzman C.D. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature. 2006;439:865–870.
1. Pearce J.M, Hall G. A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 1980;87:532–552.
1. Platt M.L, Glimcher P.W. Neural correlates of decision variables in parietal cortex. Nature. 1999;400:233–238.
1. Preuschoff K, Bossaerts P. Adding prediction risk to the theory of reward learning. Ann. NY Acad. Sci. 2007;1104:135–146.
1. Preuschoff K, Bossaerts P, Quartz S.R. Neural differentiation of expected reward and risk in human subcortical structures. Neuron. 2006;51:381–390.
1. Real L.A. Animal choice behavior and the evolution of cognitive architecture. Science. 1991;253:980–986.
1. Rescorla R.A, Wagner A.R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black A.H, Prokasy W.F, editors. Classical conditioning II: current research and theory. Appleton Century Crofts; New York, NY: 1972. pp. 64–99.
1. Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science. 2005;310:1337–1340.
1. Sanfey A.G, Hastie R, Colvin M.K, Grafman J. Phineas gauged: decision making and the human prefrontal cortex. Neuropsychologia. 2003;41:1218–1229.
1. Schultz W, Dayan P, Montague R.R. A neural substrate of prediction and reward. Science. 1997;275:1593–1599.
1. Shidara M, Richmond B.J. Anterior cingulate: single neuron signals related to degree of reward expectancy. Science. 2002;296:1709–1711.
1. Stephens J.W, Krebs J.R. Princeton University Press; Princeton, NJ: 1986. Foraging theory.
1. Sutton R.S, Barto A.G. MIT Press; Cambridge, MA: 1998. Reinforcement learning.
1. Tobin J. Liquidity preference as behavior towards risk. Rev. Econ. Stud. 1958;25:65–86.
1. Tobler P.N, Fiorillo C.D, Schultz W. Adaptive coding of reward value by dopamine neurons. Science. 2005;307:1642–1645.
1. Tobler P.N, O'Doherty J.P, Dolan R, Schultz W. Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. J. Neurophysiol. 2007;97:1621–1632.
1. von Neumann J, Morgenstern O. Princeton University Press; Princeton, NJ: 1944. The theory of games and economic behavior.
1. Watanabe M. Reward expectancy in primate prefrontal neurons. Nature. 1996;382:629–632.
1. Weber E.U, Milliman R.A. Perceived risk attitudes: relating risk perception to risky choice. Manage. Sci. 1997;43:123–144.
1. Weber E.U, Shafir S, Blais A.-R. Predicting risk sensitivity in humans and lower animals: risk as variance or coefficient of variation. Psychol. Rev. 2004;111:430–445.

Source: PubMed

Explicit neural signals reflecting reward uncertainty

Abstract

Figures

References

Patrocinadores e Colaboradores

Condições médicas

Intervenções de drogas