A bayesian foundation for individual learning under uncertainty
Christoph Mathys, Jean Daunizeau, Karl J Friston, Klaas E Stephan, Christoph Mathys, Jean Daunizeau, Karl J Friston, Klaas E Stephan
Abstract
Computational learning models are critical for understanding mechanisms of adaptive behavior. However, the two major current frameworks, reinforcement learning (RL) and Bayesian learning, both have certain limitations. For example, many Bayesian models are agnostic of inter-individual variability and involve complicated integrals, making online learning difficult. Here, we introduce a generic hierarchical Bayesian framework for individual learning under multiple forms of uncertainty (e.g., environmental volatility and perceptual uncertainty). The model assumes Gaussian random walks of states at all but the first level, with the step size determined by the next highest level. The coupling between levels is controlled by parameters that shape the influence of uncertainty on learning in a subject-specific fashion. Using variational Bayes under a mean-field approximation and a novel approximation to the posterior energy function, we derive trial-by-trial update equations which (i) are analytical and extremely efficient, enabling real-time learning, (ii) have a natural interpretation in terms of RL, and (iii) contain parameters representing processes which play a key role in current theories of learning, e.g., precision-weighting of prediction error. These parameters allow for the expression of individual differences in learning and may relate to specific neuromodulatory mechanisms in the brain. Our model is very general: it can deal with both discrete and continuous states and equally accounts for deterministic and probabilistic relations between environmental events and perceptual states (i.e., situations with and without perceptual uncertainty). These properties are illustrated by simulations and analyses of empirical time series. Overall, our framework provides a novel foundation for understanding normal and pathological learning that contextualizes RL within a generic Bayesian scheme and thus connects it to principles of optimality from probability theory.
Keywords: acetylcholine; decision-making; dopamine; hierarchical models; neuromodulation; serotonin; variational Bayes; volatility.
Figures
References
- Beal M. J. (2003). Variational Algorithms for Approximate Bayesian Inference. Ph.D. thesis, University College London; Available at:
- Beck J., Ma W., Kiani R., Hanks T., Churchland A., Roitman J., Shadlen M., Latham P. E., Pouget A. (2008). Probabilistic population codes for Bayesian decision making. Neuron 60, 1142–115210.1016/j.neuron.2008.09.021
- Behrens T. E. J., Hunt L. T., Woolrich M. W., Rushworth M. F. S. (2008). Associative learning of social value. Nature 456, 245–24910.1038/nature07538
- Behrens T. E. J., Woolrich M. W., Walton M. E., Rushworth M. F. S. (2007). Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–122110.1038/nn1954
- Bresciani J., Dammeier F., Ernst M. O. (2006). Vision and touch are automatically integrated for the perception of sequences of events. J. Vis. 6, 554–56410.1167/6.5.2
- Brodersen K. H., Penny W. D., Harrison L. M., Daunizeau J., Ruff C. C., Duzel E., Friston K. J., Stephan K. E. (2008). Integrated Bayesian models of learning and decision making for saccadic eye movements. Neural Netw. 21, 1247–126010.1016/j.neunet.2008.08.007
- Corrado G. S., Sugrue L. P., Sebastian Seung H., Newsome W. T. (2005). Linear-nonlinear-poisson models of primate choice dynamics. J. Exp. Anal. Behav. 84, 581–61710.1901/jeab.2005.23-05
- Cox R. T. (1946). Probability, frequency and reasonable expectation. Am. J. Phys. 14, 1–1310.1119/1.1990764
- Daunizeau J., den Ouden H. E. M., Pessiglione M., Kiebel S. J., Stephan K. E., Friston K. J. (2010a). Observing the observer (I): meta-Bayesian models of learning and decision-making. PLoS ONE 5, e15554.10.1371/journal.pone.0015554
- Daunizeau J., den Ouden H. E. M., Pessiglione M., Kiebel S. J., Friston K. J., Stephan K. E. (2010b). Observing the observer (II): deciding when to decide. PLoS ONE 5, e15555.10.1371/journal.pone.0015555
- Daw N. D., O'Doherty J. P., Dayan P., Seymour B., Dolan R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature 441, 876–87910.1038/nature04766
- Dayan P., Huys Q. J. (2009). Serotonin in affective control. Annu. Rev. Neurosci. 32, 95–12610.1146/annurev.neuro.051508.135607
- Dayan P., Niv Y. (2008). Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–19610.1016/j.conb.2008.08.003
- den Ouden H. E. M., Daunizeau J., Roiser J., Friston K. J., Stephan K. E. (2010). Striatal prediction error modulates cortical coupling. J. Neurosci. 30, 3210–321910.1523/JNEUROSCI.4458-09.2010
- Deneve S. (2008). Bayesian spiking neurons II: learning. Neural. Comput. 20, 118–14510.1162/neco.2008.20.1.118
- Doya K. (2008). Modulators of decision making. Nat. Neurosci. 11, 410–41610.1038/nn2077
- Fearnhead P., Liu Z. (2007). On-line inference for multiple changepoint problems. J. R. Stat. Soc. Series B Stat. Methodol. 69, 589–60510.1111/j.1467-9868.2007.00601.x
- Frank M. J. (2008). Schizophrenia: a computational reinforcement learning perspective. Schizophr. Bull. 34, 1008–101110.1093/schbul/sbn123
- Frank M. J., Doll B. B., Oas-Terpstra J., Moreno F. (2009). Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–106810.1038/nn.2342
- Frank M. J., Moustafa A. A., Haughey H. M., Curran T., Hutchison K. E. (2007). Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc. Natl. Acad. Sci. U.S.A. 104, 16311–1631610.1073/pnas.0706111104
- Friston K. (2008). Hierarchical models in the brain. PLoS Comput. Biol. 4, e1000211.10.1371/journal.pcbi.1000211
- Friston K. (2009). The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. (Regul. Ed.) 13, 293–301
- Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. (2007). Variational free energy and the Laplace approximation. Neuroimage 34, 220–23410.1016/j.neuroimage.2006.08.035
- Friston K. J., Stephan K. E. (2007). Free-energy and the brain. Synthese 159, 417–45810.1007/s11229-007-9237-y
- Geisler W. S., Diehl R. L. (2002). Bayesian natural selection and the evolution of perceptual systems. Philos. Trans. R. Soc. B Biol. Sci. 357, 419–448
- Gershman S. J., Niv Y. (2010). Learning latent structure: carving nature at its joints. Curr. Opin. Neurobiol. 20, 251–25610.1016/j.conb.2010.02.008
- Gluck M. A., Shohamy D., Myers C. (2002). How do people solve the weather prediction task? individual variability in strategies for probabilistic category learning. Learn. Mem. 9, 408–41810.1101/lm.45202
- Gu Q. (2002). Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity. Neuroscience 111, 815–83510.1016/S0306-4522(02)00026-X
- Häfner H., Maurer K., Löffler W., an der Heiden W., Munk-Jørgensen P., Hambrecht M., Riecher-Rössler A. (1998). The ABC schizophrenia study: a preliminary overview of the results. Soc. Psychiatry Psychiatr. Epidemiol. 33, 380–38610.1007/s001270050069
- Herz A. V. M., Gollisch T., Machens C. K., Jaeger D. (2006). Modeling single-neuron dynamics and computations: a balance of detail and abstraction. Science 314, 80–8510.1126/science.1127240
- Jaynes E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106, 620.10.1103/PhysRev.106.620
- Jaynes E. T. (2003). Probability Theory: The Logic of Science. Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511790423
- Kalman R.E. (1960). A new approach to linear filtering and prediction problems. J.Basic Eng. 82, 35–45
- Kording K. P., Wolpert D. M. (2004). Bayesian integration in sensorimotor learning. Nature 427, 244–24710.1038/nature02169
- Krugel L. K., Biele G., Mohr P. N. C., Li S., Heekeren H. R. (2009). Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions. Proc. Natl. Acad. Sci. U.S.A. 106, 17951–1795610.1073/pnas.0905191106
- Laplace P. (1774). Mémoire sur la probabilité des causes par les évènemens. Mém. Acad. Roy. Sci. 6, 621–656
- Laplace P. (1812). Théorie Analytique des Probabilités. Paris: Courcier Imprimeur
- London M., Hausser M. (2005). Dendritic computation. Annu. Rev. Neurosci. 28, 503–53210.1146/annurev.neuro.28.061604.135703
- Montague P. R., Hyman S. E., Cohen J. D. (2004). Computational roles for dopamine in behavioural control. Nature 431, 760–76710.1038/nature03015
- Murray G. K., Corlett P. R., Clark L., Pessiglione M., Blackwell A. D., Honey G., Jones P. B., Bullmore E. T., Robbins T. W., Fletcher P. C. (2007). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Mol. Psychiatry 13, 267–27610.1038/sj.mp.4002058
- Nassar M. R., Wilson R. C., Heasly B., Gold J. I. (2010). An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–1237810.1523/JNEUROSCI.0822-10.2010
- O'Doherty J., Dayan P., Schultz J., Deichmann R., Friston K., Dolan R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–45410.1126/science.1094285
- Orbán G., Fiser J., Aslin R. N., Lengyel M. (2008). Bayesian learning of visual chunks by human observers. Proc. Natl. Acad. Sci. U.S.A. 105, 2745–275010.1073/pnas.0708424105
- Pessiglione M., Seymour B., Flandin G., Dolan R. J., Frith C. D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behavior in humans. Nature 442, 1042–104510.1038/nature05051
- Preuschoff K., Bossaerts P. (2007). Adding prediction risk to the theory of reward learning. Ann. N. Y. Acad. Sci. 1104, 135–14610.1196/annals.1390.005
- Rescorla R. A., Wagner A. R. (1972). “A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement,” in Classical Conditioning II: Current Research and Theory, eds Black A. H., Prokasy W. F. (New York: Appleton-Century-Crofts; ), 64–99
- Schultz W., Dayan P., Montague P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593–159910.1126/science.275.5306.1593
- Smith A., Li M., Becker S., Kapur S. (2006). Dopamine, prediction error and associative learning: a model-based account. Network 17, 61–8410.1080/09548980500361624
- Stephan K. E., Friston K. J., Frith C. D. (2009). Dysconnection in schizophrenia: from abnormal synaptic plasticity to failures of self-monitoring. Schizophr. Bull. 35, 509–52710.1093/schbul/sbn176
- Steyvers M., Brown S. (2006). Prediction and change detection. Adv. Neural Inf. Process Syst. 18, 1281–1288
- Steyvers M., Lee M. D., Wagenmakers E. (2009). A Bayesian analysis of human decision-making on bandit problems. J. Math. Psychol. 53, 168–17910.1016/j.jmp.2008.11.002
- Sutton R. S., Barto A. G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press
- Thiel C. M., Huston J. P., Schwarting R. K. (1998). Hippocampal acetylcholine and habituation learning. Neuroscience 85, 1253–126210.1016/S0306-4522(98)00030-X
- Wilson R. C., Nassar M. R., Gold J. I. (2010). Bayesian online learning of the hazard rate in change-point problems. Neural. Comput. 22, 2452–247610.1162/NECO_a_00007
- Xu F., Tenenbaum J. B. (2007). Sensitivity to sampling in Bayesian word learning. Dev. Sci. 10, 288–29710.1111/j.1467-7687.2007.00590.x
- Yang T., Shadlen M. N. (2007). Probabilistic reasoning by neurons. Nature 447, 1075–108010.1038/nature05852
- Yu A. J., Dayan P. (2005). Uncertainty, neuromodulation, and attention. Neuron 46, 681–69210.1016/j.neuron.2005.04.026
- Yuille A., Kersten D. (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. (Regul. Ed.), 10, 301–308
Source: PubMed