Hierarchical models in the brain
Karl Friston, Karl Friston
Abstract
This paper describes a general model that subsumes many parametric models for continuous data. The model comprises hidden layers of state-space or dynamic causal models, arranged so that the output of one provides input to another. The ensuing hierarchy furnishes a model for many types of data, of arbitrary complexity. Special cases range from the general linear model for static data to generalised convolution models, with system noise, for nonlinear time-series analysis. Crucially, all of these models can be inverted using exactly the same scheme, namely, dynamic expectation maximization. This means that a single model and optimisation scheme can be used to invert a wide range of models. We present the model and a brief review of its inversion to disclose the relationships among, apparently, diverse generative models of empirical data. We then show that this inversion can be formulated as a simple neural network and may provide a useful metaphor for inference and learning in the brain.
Conflict of interest statement
The author has declared that no competing interests exist.
Figures
References
- Friston KJ. Variational filtering. Neuroimage. 2008;41(3):747–766.
- Friston KJ, Trujillo-Barreto N, Daunizeau J. DEM: a variational treatment of dynamic systems. Neuroimage. 2008;41(3):849–885.
- Friston KJ. Learning and inference in the brain. Neural Netw. 2003;16:1325–1352.
- Friston KJ. A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci. 2005;360:815–836.
- Friston K, Kilner J, Harrison L. A free energy principle for the brain. J Physiol Paris. 2006;100(1–3):70–87.
- Stratonovich RL. Topics in the Theory of Random Noise. New York: Gordon and Breach; 1967.
- Jazwinski AH. Stochastic Processes and Filtering Theory. San Diego: Academic Press; 1970. pp. 122–125.
- Kass RE, Steffey D. Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J Am Stat Assoc. 1989;407:717–726.
- Efron B, Morris C. Stein's estimation rule and its competitors – an empirical Bayes approach. J Am Stats Assoc. 1973;68:117–130.
- Cox DR, Miller HD. The theory of stochastic processes. 1965. Methuen. London.
- Feynman RP. Statistical mechanics. Reading (Massachusetts): Benjamin; 1972.
- Hinton GE, von Cramp D. Keeping neural networks simple by minimising the description length of weights. 1993. pp. 5–13. In: Proceedings of COLT-93.
- MacKay DJC. Free-energy minimisation algorithm for decoding and cryptoanalysis. Electron Lett. 1995;31:445–447.
- Neal RM, Hinton GE. A view of the EM algorithm that justifies incremental sparse and other variants. In: Jordan MI, editor. Learning in Graphical Models. Dordrecht, The Netherlands: Kluwer Academic; 1998.
- Friston K, Mattout J, Trujillo-Barreto N, Ashburner J, Penny W. Variational Bayes and the Laplace approximation. Neuroimage. 2007;34:220–234.
- Beal MJ, Ghahramani Z. The variational Bayesian EM algorithm for incomplete Data: with application to scoring graphical model structures. In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M, editors. Bayesian Statistics, Chapter 7. Oxford, UK: Oxford University Press; 2003.
- Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B. 1977;39:1–38.
- Harville DA. Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc. 1977;72:320–338.
- Ozaki T. A bridge between nonlinear time-series models and nonlinear stochastic dynamical systems: A local linearization approach. Stat Sin. 1992;2:113–135.
- Roweis S, Ghahramani Z. A unifying review of linear Gaussian models. Neural Comput. 1999;11(2):305–345.
- Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagations. In: Rumelhart DE, McClelland JL, editors. Parallel Distributed Processing: Explorations in the Microstructures of Cognition. Vol. 1. Cambridge (Massachusetts): MIT Press; 1986. pp. 318–362.
- Chen T, Chen H. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans Neural Netw. 1995;6(4):918–928.
- Fliess M, Lamnabhi M, Lamnabhi-Lagarrigue F. An algebraic approach to nonlinear functional expansions. IEEE Trans Circuits Syst. 1983;30:554–570.
- Friston KJ. Bayesian estimation of dynamical systems: an application to fMRI. Neuroimage. 2002;16(2):513–530.
- Mattout J, Phillips C, Penny WD, Rugg MD, Friston KJ. MEG source localization under multiple constraints: an extended Bayesian framework. Neuroimage. 2006;30:753–767.
- Tipping ME. Sparse Bayesian learning and the Relevance Vector Machine. J Mach Learn Res. 2001;1:211–244.
- Ripley BD. Flexible Nonlinear Approaches to Classification. In: Cherkassy V, Friedman JH, Wechsler H, editors. From Statistics to Neural Networks. New York: Springer; 1994. pp. 105–126.
- Rasmussen CE. Evaluation of Gaussian Processes and Other Methods for Nonlinear Regression [PhD thesis]. Toronto, Canada: Department of Computer Science, University of Toronto. 1996. .
- Kim H-C, Ghahramani Z. Bayesian Gaussian process classification with the EM-EP algorithm. IEEE Trans Pattern Anal Mach Intell. 2006;28(12):1948–1959.
- Kalman R. A new approach to linear filtering and prediction problems. ASME Trans J Basic Eng. 1960;82(1):35–45.
- Wang B, Titterington DM. Variational Bayesian inference for partially observed diffusions. Technical Report 04-4, University of Glasgow. 2004. .
- Sørensen H. Parametric inference for diffusion processes observed at discrete points in time: a survey. Int Stat Rev. 2004;72(3):337–354.
- Ghahramani Z. Unsupervised Learning. In: Bousquet O, Raetsch G, von Luxburg U, editors. Advanced Lectures on Machine Learning LNAI 3176. Berlin, Germany: Springer-Verlag; 2004.
- Friston K, Phillips J, Chawla D, Büchel C. Nonlinear PCA: characterizing interactions between modes of brain activity. Philos Trans R Soc Lond B Biol Sci. 2000;355(1393):135–46.
- Tipping ME, Bishop C. Probabilistic principal component analysis. J R Stat Soc Ser B. 1999;61(3):611–622.
- Bell AJ, Sejnowski TJ. An information maximisation approach to blind separation and blind de-convolution. Neural Comput. 1995;7:1129–1159.
- Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381:607–609.
- Maunsell JH, van Essen DC. The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey. J Neurosci. 1983;3:2563–2586.
- Zeki S, Shipp S. The functional logic of cortical connections. Nature. 1988;335:311–31.
- Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex. 1991;1:1–47.
- Mesulam MM. From sensation to cognition. Brain. 1998;121:1013–1052.
- Rockland KS, Pandya DN. Laminar origins and terminations of cortical connections of the occipital lobe in the rhesus monkey. Brain Res. 1979;179:3–20.
- Murphy PC, Sillito AM. Corticofugal feedback influences the generation of length tuning in the visual pathway. Nature. 1987;329:727–729.
- Sherman SM, Guillery RW. On the actions that one nerve cell can have on another: distinguishing “drivers” from “modulators”. Proc Natl Acad Sci U S A. 1998;95:7121–7126.
- Angelucci A, Levitt JB, Walton EJ, Hupe JM, Bullier J, Lund JS. Circuits for local and global signal integration in primary visual cortex. J Neurosci. 2002;22:8633–8646.
- DeFelipe J, Alonso-Nanclares L, Arellano JI. Microstructure of the neocortex: comparative aspects. J Neurocytol. 2002;31:299–316.
- Hupe JM, James AC, Payne BR, Lomber SG, Girard P, et al. Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons. Nature. 1998;394:784–787.
- Rosier AM, Arckens L, Orban GA, Vandesande F. Laminar distribution of NMDA receptors in cat and monkey visual cortex visualized by [3H]-MK-801 binding. J Comp Neurol. 1993;335:369–380.
- Mumford D. On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biol Cybern. 1992;66:241–251.
- Edelman GM. Neural Darwinism: selection and reentrant signaling in higher brain function. Neuron. 1993;10:115–125.
- Grossberg S, Pilly P. Temporal dynamics of decision-making during motion perception in the visual cortex. Vis Res. 2008;48:1345–1373.
- Grossberg S, Versace M. Spikes, synchrony, and attentive learning by laminar thalamocortical circuits. Brain Res. 2008;1218:278–312.
- Chait M, Poeppel D, de Cheveigné A, Simon JZ. Processing asymmetry of transitions between order and disorder in human auditory cortex. J Neurosci. 2007;27(19):5207–5214.
- Crick F, Koch C. Constraints on cortical and thalamic projections: the no-strong-loops hypothesis. Nature. 1998;391(6664):245–250.
- London M, Häusser M. Dendritic computation. Annu Rev Neurosci. 2005;28:503–532.
- Buonomano DV, Merzenich MM. Cortical plasticity: from synapses to maps. Annu Rev Neurosci. 1998;21:149–186.
- Martin SJ, Grimwood PD, Morris RG. Synaptic plasticity and memory: an evaluation of the hypothesis. Annu Rev Neurosci. 2000;23:649–711.
- Treue S, Maunsell HR. Attentional modulation of visual motion processing in cortical areas MT and MST. Nature. 1996;382:539–541.
- Martinez-Trujillo JC, Treue S. Feature-based attention increases the selectivity of population responses in primate visual cortex. Curr Biol. 2004;14:744–751.
- Chelazzi L, Miller E, Duncan J, Desimone R. A neural basis for visual search in inferior temporal cortex. Nature. 1993;363:345–347.
- Desimone R. Neural mechanisms for visual memory and their role in attention. Proc Natl Acad Sci U S A. 1996;93(24):13494–13499.
- Schroeder CE, Mehta AD, Foxe JJ. Determinants and mechanisms of attentional modulation of neural processing. Front Biosci. 2001;6:D672–D684.
- Yu AJ, Dayan P. Uncertainty, neuromodulation and attention. Neuron. 2005;46:681–692.
- Rao RP, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive field effects. Nat Neurosci. 1998;2:79–87.
- Tseng KY, O'Donnell P. Dopamine-glutamate interactions controlling prefrontal cortical pyramidal cell excitability involve multiple signaling mechanisms. J Neurosci. 2004;24:5131–5139.
- Brocher S, Artola A, Singer W. Agonists of cholinergic and noradrenergic receptors facilitate synergistically the induction of long-term potentiation in slices of rat visual cortex. Brain Res. 1992;573:27–36.
- Gu Q. Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity. Neuroscience. 2002;111:815–835.
- Friston KJ, Tononi G, Reeke GN, Jr, Sporns O, Edelman GM. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience. 1994;59(2):229–243.
- Montague PR, Dayan P, Person C, Sejnowski TJ. Bee foraging in uncertain environments using predictive Hebbian learning. Nature. 1995;377(6551):725–728.
- Schultz W. Multiple dopamine functions at different time courses. Annu Rev Neurosci. 2007;30:259–288.
- Niv Y, Duff MO, Dayan P. Dopamine, uncertainty and TD learning. Behav Brain Funct. 2005;4:1–6.
- Kawato M, Hayakawa H, Inui T. A forward-inverse optics model of reciprocal connections between visual cortical areas. Network. 1993;4:415–422.
- Desimone R, Duncan J. Neural mechanisms of selective visual attention. Annu Rev Neurosci. 1995;18:193–222.
- Abbott LF, Varela JA, Sen K, Nelson SB. Synaptic depression and cortical gain control. Science. 1997;275(5297):220–224.
- Archambeau C, Cornford D, Opper M, Shawe-Taylor J. Gaussian process approximations of stochastic differential equations. In: JMLR: Workshop and Conference Proceedings. 2007. pp. 1–16.
- Kappen HJ. An introduction to stochastic control theory, path integrals and reinforcement learning. 2008. .
- John ER. Switchboard versus statistical theories of learning and memory. Science. 1972;177(4052):850–864.
- Freeman WJ. A pseudo-equilibrium thermodynamic model of information processing in nonlinear brain dynamics. Neural Netw. 2008;21(2–3):257–265.
- Beskos A, Papaspiliopoulos O, Roberts GO, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion). J R Stat Soc Ser B. 2006;68:333–361.
- Evensen G, van Leeuwen PJ. An ensemble Kalman smoother for nonlinear dynamics. Mon Weather Rev. 2000;128(6):1852–1867.
- Schiff SJ, Sauer T. Kalman filter control of a model of spatiotemporal cortical dynamics. J Neural Eng. 2008;5(1):1–8.
- Restrepo JM. A path integral method for data assimilation. Physica D. 2008;237(1):14–27.
- Friston KJ, Kiebel S. Predictive coding under the free energy principle. 2009. Philos Trans R Soc Lond. Under review.
- Henson R, Shallice T, Dolan R. Neuroimaging evidence for dissociable forms of repetition priming. Science. 2000;287:1269–1272.
- Näätänen R. Mismatch negativity: clinical research and possible applications. Int J Psychophysiol. 2003;48:179–188.
- Lee TS, Mumford D. Hierarchical Bayesian inference in the visual cortex. J Opt Soc Am A. 2003;20:1434–1448.
- Helmholtz H. Handbuch der Physiologischen Optik. English translation. In: Southall JPC, editor. Dover: New York; 1860/1962. Vol. 3.
- Barlow HB. Possible principles underlying the transformation of sensory messages. In: Rosenblith WA, editor. Sensory Communication. Cambridge (Massachusetts): MIT Press; 1961.
- Neisser U. Cognitive psychology. New York: Appleton-Century-Crofts; 1967.
- Ballard DH, Hinton GE, Sejnowski TJ. Parallel visual computation. Nature. 1983;306:21–26.
- Dayan P, Hinton GE, Neal RM. The Helmholtz machine. Neural Comput. 1995;7:889–904.
Source: PubMed