A bayesian foundation for individual learning under uncertainty

Christoph Mathys, Jean Daunizeau, Karl J Friston, Klaas E Stephan, Christoph Mathys, Jean Daunizeau, Karl J Friston, Klaas E Stephan

Abstract

Computational learning models are critical for understanding mechanisms of adaptive behavior. However, the two major current frameworks, reinforcement learning (RL) and Bayesian learning, both have certain limitations. For example, many Bayesian models are agnostic of inter-individual variability and involve complicated integrals, making online learning difficult. Here, we introduce a generic hierarchical Bayesian framework for individual learning under multiple forms of uncertainty (e.g., environmental volatility and perceptual uncertainty). The model assumes Gaussian random walks of states at all but the first level, with the step size determined by the next highest level. The coupling between levels is controlled by parameters that shape the influence of uncertainty on learning in a subject-specific fashion. Using variational Bayes under a mean-field approximation and a novel approximation to the posterior energy function, we derive trial-by-trial update equations which (i) are analytical and extremely efficient, enabling real-time learning, (ii) have a natural interpretation in terms of RL, and (iii) contain parameters representing processes which play a key role in current theories of learning, e.g., precision-weighting of prediction error. These parameters allow for the expression of individual differences in learning and may relate to specific neuromodulatory mechanisms in the brain. Our model is very general: it can deal with both discrete and continuous states and equally accounts for deterministic and probabilistic relations between environmental events and perceptual states (i.e., situations with and without perceptual uncertainty). These properties are illustrated by simulations and analyses of empirical time series. Overall, our framework provides a novel foundation for understanding normal and pathological learning that contextualizes RL within a generic Bayesian scheme and thus connects it to principles of optimality from probability theory.

Keywords: acetylcholine; decision-making; dopamine; hierarchical models; neuromodulation; serotonin; variational Bayes; volatility.

Figures

Figure 1
Figure 1
Overview of the hierarchical generative model. The probability at each level is determined by the variables and parameters at the next highest level. Note that further levels can be added on top of the third. These levels relate to each other by determining the step size (volatility or variance) of a random walk. The topmost step size is a constant parameter ϑ. At the first level, x1 determines the probability of the input u.
Figure 2
Figure 2
Generative model and posterior distributions on hidden states. Left: schematic representation of the generative model as a Bayesian network. x1(k), x2(k), x3(k) are hidden states of the environment at time point k. They generate u(k), the input at time point k, and depend on their immediately preceding values x2(k−1), x3(k−1) and the parameters ϑ, ω, κ. Right: the minimal parametric description q(x) of the posteriors at each level. The distribution parameters μ (posterior expectation) and σ (posterior variance) can be found by approximating the minimal parametric posteriors to the mean-field marginal posteriors. For multidimensional states x, μ is a vector and σ a covariance matrix.
Figure 3
Figure 3
Quadratic approximation to the variational energy. Approximating the variational energy I(x) (blue) by a quadratic function leads (by exponentiation) to a Gaussian posterior. To find our approximation Ĩ(x) (red), we expand I(x) to second order at the preceding posterior expectation μ(k−1). The argmax of Ĩ(x) is then the new posterior expectation μ(k). This generally leads to a different result from the Laplace approximation (dashed), but there is a priori no reason to regard either approximation as more exact than the other.
Figure 4
Figure 4
Interpretation of the variational update equations in terms of Rescorla–Wagner learning. The Rescorla–Wagner update is Δprediction = learning rate × prediction error. Our expectation update equations can be interpreted in these terms.
Figure 5
Figure 5
Reference scenario: ϑ = 0.5, ω = −2.2, κ = 1.4. A simulation of 320 trials. Bottom: the first level. Input u is represented by green dots. In the absence of perceptual uncertainty, this corresponds to x1. The fine black line is the true probability (unknown to the agent) that x1 = 1. The red line shows s(μ2); i.e., the agent's posterior expectation that x1 = 1. Given the input and update rules, the simulation is uniquely determined by the value of the parameters ϑ, ω, and κ. Middle: the second level with the posterior expectation μ2 of x2. Top: the third level with the posterior expectation μ3 of x3. In all three panels, the initial values of the various μ are indicated by circles at trial k = 0.
Figure 6
Figure 6
Reduced ϑ = 0.05 (unchanged ω = −2.2, κ = 1.4). Symbols have the same meaning as in Figure 5. Here, the expected x3 is more stable. The learning rate in x2 is initially unaffected but owing to more stable x3 it no longer increases after the period of increased volatility.
Figure 7
Figure 7
Reduced ω = −4 (unchanged ϑ = 0.5, κ = 1.4). Symbols have the same meaning as in Figure 5. The small learning rate in x2 leads to an extremely stable expected x3. Despite prediction errors, the agent makes only small updates to its beliefs about its environment.
Figure 8
Figure 8
Reduced κ = 0.2 (unchanged ϑ = 0.5, ω = −2.2). Symbols have the same meaning as in Figure 5. x2 and x3 are only weakly coupled. Despite uncertainty about x3, only small updates to μ3 take place. Sensitivity to changes in volatility is reduced. x2 is not affected directly, but its learning rate does not increase with volatility.
Figure 9
Figure 9
Simulations including standard deviations of posterior distributions. Top to bottom: the four scenarios from Figure 5–8. Left: μ2 (bold red line); fine red lines indicate the range of ±σ2 around μ2. Right: μ3 (bold blue line); fine blue lines indicate the range ±σ3 of around μ3. Circles indicate initial values.
Figure 10
Figure 10
A simulation where risk is constant (ϑ = 0.2, ω = −2.3, κ = 1.6). Symbols have the same meanings as in Figure 5. The same basic phenomena shown in Figure 5 can be observed here.
Figure 11
Figure 11
Inference on a continuous-valued state (ϑ = 0.3, ω = −12, κ = 1, α = 2·10−5). Reference scenario for the model of hierarchical Gaussian random walks applied to a continuous-valued state at the bottom level. The state is the value x2 of the U.S. Dollar against the Swiss Franc during the first 180 trading days of the year 2010. Bottom panel: input u representing closing exchange rates (green dots). The bold red line surrounded by two fine red lines indicates the range μ2±σ2. Top panel: The range μ3±σ3 of the log-volatility x3 of x2.
Figure 12
Figure 12
Reduced α = 10−6 (ϑ = 0.3, ω = −12, κ = 1). Reduced perceptual uncertainty α with respect to the reference scenario of Figure 11.
Figure 13
Figure 13
Increased α = 10−4 (ϑ = 0.3, ω = −12, κ = 1). Increased perceptual uncertainty a with respect to the reference scenario of Figure 11.
Figure 14
Figure 14
Reduced ϑ = 0.01 (ω = −12, κ = 1, α = 2·10−5). Reduced ϑ with respect to the reference scenario of Figure 11.

References

    1. Beal M. J. (2003). Variational Algorithms for Approximate Bayesian Inference. Ph.D. thesis, University College London; Available at:
    1. Beck J., Ma W., Kiani R., Hanks T., Churchland A., Roitman J., Shadlen M., Latham P. E., Pouget A. (2008). Probabilistic population codes for Bayesian decision making. Neuron 60, 1142–115210.1016/j.neuron.2008.09.021
    1. Behrens T. E. J., Hunt L. T., Woolrich M. W., Rushworth M. F. S. (2008). Associative learning of social value. Nature 456, 245–24910.1038/nature07538
    1. Behrens T. E. J., Woolrich M. W., Walton M. E., Rushworth M. F. S. (2007). Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–122110.1038/nn1954
    1. Bresciani J., Dammeier F., Ernst M. O. (2006). Vision and touch are automatically integrated for the perception of sequences of events. J. Vis. 6, 554–56410.1167/6.5.2
    1. Brodersen K. H., Penny W. D., Harrison L. M., Daunizeau J., Ruff C. C., Duzel E., Friston K. J., Stephan K. E. (2008). Integrated Bayesian models of learning and decision making for saccadic eye movements. Neural Netw. 21, 1247–126010.1016/j.neunet.2008.08.007
    1. Corrado G. S., Sugrue L. P., Sebastian Seung H., Newsome W. T. (2005). Linear-nonlinear-poisson models of primate choice dynamics. J. Exp. Anal. Behav. 84, 581–61710.1901/jeab.2005.23-05
    1. Cox R. T. (1946). Probability, frequency and reasonable expectation. Am. J. Phys. 14, 1–1310.1119/1.1990764
    1. Daunizeau J., den Ouden H. E. M., Pessiglione M., Kiebel S. J., Stephan K. E., Friston K. J. (2010a). Observing the observer (I): meta-Bayesian models of learning and decision-making. PLoS ONE 5, e15554.10.1371/journal.pone.0015554
    1. Daunizeau J., den Ouden H. E. M., Pessiglione M., Kiebel S. J., Friston K. J., Stephan K. E. (2010b). Observing the observer (II): deciding when to decide. PLoS ONE 5, e15555.10.1371/journal.pone.0015555
    1. Daw N. D., O'Doherty J. P., Dayan P., Seymour B., Dolan R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature 441, 876–87910.1038/nature04766
    1. Dayan P., Huys Q. J. (2009). Serotonin in affective control. Annu. Rev. Neurosci. 32, 95–12610.1146/annurev.neuro.051508.135607
    1. Dayan P., Niv Y. (2008). Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–19610.1016/j.conb.2008.08.003
    1. den Ouden H. E. M., Daunizeau J., Roiser J., Friston K. J., Stephan K. E. (2010). Striatal prediction error modulates cortical coupling. J. Neurosci. 30, 3210–321910.1523/JNEUROSCI.4458-09.2010
    1. Deneve S. (2008). Bayesian spiking neurons II: learning. Neural. Comput. 20, 118–14510.1162/neco.2008.20.1.118
    1. Doya K. (2008). Modulators of decision making. Nat. Neurosci. 11, 410–41610.1038/nn2077
    1. Fearnhead P., Liu Z. (2007). On-line inference for multiple changepoint problems. J. R. Stat. Soc. Series B Stat. Methodol. 69, 589–60510.1111/j.1467-9868.2007.00601.x
    1. Frank M. J. (2008). Schizophrenia: a computational reinforcement learning perspective. Schizophr. Bull. 34, 1008–101110.1093/schbul/sbn123
    1. Frank M. J., Doll B. B., Oas-Terpstra J., Moreno F. (2009). Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–106810.1038/nn.2342
    1. Frank M. J., Moustafa A. A., Haughey H. M., Curran T., Hutchison K. E. (2007). Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc. Natl. Acad. Sci. U.S.A. 104, 16311–1631610.1073/pnas.0706111104
    1. Friston K. (2008). Hierarchical models in the brain. PLoS Comput. Biol. 4, e1000211.10.1371/journal.pcbi.1000211
    1. Friston K. (2009). The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. (Regul. Ed.) 13, 293–301
    1. Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. (2007). Variational free energy and the Laplace approximation. Neuroimage 34, 220–23410.1016/j.neuroimage.2006.08.035
    1. Friston K. J., Stephan K. E. (2007). Free-energy and the brain. Synthese 159, 417–45810.1007/s11229-007-9237-y
    1. Geisler W. S., Diehl R. L. (2002). Bayesian natural selection and the evolution of perceptual systems. Philos. Trans. R. Soc. B Biol. Sci. 357, 419–448
    1. Gershman S. J., Niv Y. (2010). Learning latent structure: carving nature at its joints. Curr. Opin. Neurobiol. 20, 251–25610.1016/j.conb.2010.02.008
    1. Gluck M. A., Shohamy D., Myers C. (2002). How do people solve the weather prediction task? individual variability in strategies for probabilistic category learning. Learn. Mem. 9, 408–41810.1101/lm.45202
    1. Gu Q. (2002). Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity. Neuroscience 111, 815–83510.1016/S0306-4522(02)00026-X
    1. Häfner H., Maurer K., Löffler W., an der Heiden W., Munk-Jørgensen P., Hambrecht M., Riecher-Rössler A. (1998). The ABC schizophrenia study: a preliminary overview of the results. Soc. Psychiatry Psychiatr. Epidemiol. 33, 380–38610.1007/s001270050069
    1. Herz A. V. M., Gollisch T., Machens C. K., Jaeger D. (2006). Modeling single-neuron dynamics and computations: a balance of detail and abstraction. Science 314, 80–8510.1126/science.1127240
    1. Jaynes E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106, 620.10.1103/PhysRev.106.620
    1. Jaynes E. T. (2003). Probability Theory: The Logic of Science. Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511790423
    1. Kalman R.E. (1960). A new approach to linear filtering and prediction problems. J.Basic Eng. 82, 35–45
    1. Kording K. P., Wolpert D. M. (2004). Bayesian integration in sensorimotor learning. Nature 427, 244–24710.1038/nature02169
    1. Krugel L. K., Biele G., Mohr P. N. C., Li S., Heekeren H. R. (2009). Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions. Proc. Natl. Acad. Sci. U.S.A. 106, 17951–1795610.1073/pnas.0905191106
    1. Laplace P. (1774). Mémoire sur la probabilité des causes par les évènemens. Mém. Acad. Roy. Sci. 6, 621–656
    1. Laplace P. (1812). Théorie Analytique des Probabilités. Paris: Courcier Imprimeur
    1. London M., Hausser M. (2005). Dendritic computation. Annu. Rev. Neurosci. 28, 503–53210.1146/annurev.neuro.28.061604.135703
    1. Montague P. R., Hyman S. E., Cohen J. D. (2004). Computational roles for dopamine in behavioural control. Nature 431, 760–76710.1038/nature03015
    1. Murray G. K., Corlett P. R., Clark L., Pessiglione M., Blackwell A. D., Honey G., Jones P. B., Bullmore E. T., Robbins T. W., Fletcher P. C. (2007). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Mol. Psychiatry 13, 267–27610.1038/sj.mp.4002058
    1. Nassar M. R., Wilson R. C., Heasly B., Gold J. I. (2010). An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–1237810.1523/JNEUROSCI.0822-10.2010
    1. O'Doherty J., Dayan P., Schultz J., Deichmann R., Friston K., Dolan R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–45410.1126/science.1094285
    1. Orbán G., Fiser J., Aslin R. N., Lengyel M. (2008). Bayesian learning of visual chunks by human observers. Proc. Natl. Acad. Sci. U.S.A. 105, 2745–275010.1073/pnas.0708424105
    1. Pessiglione M., Seymour B., Flandin G., Dolan R. J., Frith C. D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behavior in humans. Nature 442, 1042–104510.1038/nature05051
    1. Preuschoff K., Bossaerts P. (2007). Adding prediction risk to the theory of reward learning. Ann. N. Y. Acad. Sci. 1104, 135–14610.1196/annals.1390.005
    1. Rescorla R. A., Wagner A. R. (1972). “A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement,” in Classical Conditioning II: Current Research and Theory, eds Black A. H., Prokasy W. F. (New York: Appleton-Century-Crofts; ), 64–99
    1. Schultz W., Dayan P., Montague P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593–159910.1126/science.275.5306.1593
    1. Smith A., Li M., Becker S., Kapur S. (2006). Dopamine, prediction error and associative learning: a model-based account. Network 17, 61–8410.1080/09548980500361624
    1. Stephan K. E., Friston K. J., Frith C. D. (2009). Dysconnection in schizophrenia: from abnormal synaptic plasticity to failures of self-monitoring. Schizophr. Bull. 35, 509–52710.1093/schbul/sbn176
    1. Steyvers M., Brown S. (2006). Prediction and change detection. Adv. Neural Inf. Process Syst. 18, 1281–1288
    1. Steyvers M., Lee M. D., Wagenmakers E. (2009). A Bayesian analysis of human decision-making on bandit problems. J. Math. Psychol. 53, 168–17910.1016/j.jmp.2008.11.002
    1. Sutton R. S., Barto A. G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press
    1. Thiel C. M., Huston J. P., Schwarting R. K. (1998). Hippocampal acetylcholine and habituation learning. Neuroscience 85, 1253–126210.1016/S0306-4522(98)00030-X
    1. Wilson R. C., Nassar M. R., Gold J. I. (2010). Bayesian online learning of the hazard rate in change-point problems. Neural. Comput. 22, 2452–247610.1162/NECO_a_00007
    1. Xu F., Tenenbaum J. B. (2007). Sensitivity to sampling in Bayesian word learning. Dev. Sci. 10, 288–29710.1111/j.1467-7687.2007.00590.x
    1. Yang T., Shadlen M. N. (2007). Probabilistic reasoning by neurons. Nature 447, 1075–108010.1038/nature05852
    1. Yu A. J., Dayan P. (2005). Uncertainty, neuromodulation, and attention. Neuron 46, 681–69210.1016/j.neuron.2005.04.026
    1. Yuille A., Kersten D. (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. (Regul. Ed.), 10, 301–308

Source: PubMed

3
Abonneren