Uncertainty in perception and the Hierarchical Gaussian Filter

Christoph D Mathys, Ekaterina I Lomakina, Jean Daunizeau, Sandra Iglesias, Kay H Brodersen, Karl J Friston, Klaas E Stephan, Christoph D Mathys, Ekaterina I Lomakina, Jean Daunizeau, Sandra Iglesias, Kay H Brodersen, Karl J Friston, Klaas E Stephan

Abstract

In its full sense, perception rests on an agent's model of how its sensory input comes about and the inferences it draws based on this model. These inferences are necessarily uncertain. Here, we illustrate how the Hierarchical Gaussian Filter (HGF) offers a principled and generic way to deal with the several forms that uncertainty in perception takes. The HGF is a recent derivation of one-step update equations from Bayesian principles that rests on a hierarchical generative model of the environment and its (in)stability. It is computationally highly efficient, allows for online estimates of hidden states, and has found numerous applications to experimental data from human subjects. In this paper, we generalize previous descriptions of the HGF and its account of perceptual uncertainty. First, we explicitly formulate the extension of the HGF's hierarchy to any number of levels; second, we discuss how various forms of uncertainty are accommodated by the minimization of variational free energy as encoded in the update equations; third, we combine the HGF with decision models and demonstrate the inversion of this combination; finally, we report a simulation study that compared four optimization methods for inverting the HGF/decision model combination at different noise levels. These four methods (Nelder-Mead simplex algorithm, Gaussian process-based global optimization, variational Bayes and Markov chain Monte Carlo sampling) all performed well even under considerable noise, with variational Bayes offering the best combination of efficiency and informativeness of inference. Our results demonstrate that the HGF provides a principled, flexible, and efficient-but at the same time intuitive-framework for the resolution of perceptual uncertainty in behaving agents.

Keywords: Bayesian inference; decision-making; filtering; free energy; hierarchical modeling; learning; uncertainty; volatility.

Figures

Figure 1
Figure 1
Overview of the Hierarchical Gaussian Filter (HGF). The model represents a hierarchy of coupled Gaussian random walks. The levels of the hierarchy relate to each other by determining the step size (volatility or variance) of a random walk. The topmost step size is a constant parameter ϑ.
Figure 2
Figure 2
The 3-level HGF for binary outcomes. The lowest level, x1, is binary and corresponds, in the absence of sensory noise, to sensory input u. Left: schematic representation of the generative model as a Bayesian network. x(k)1, x(k)2, x(k)3 are hidden states of the environment at time point k. They generate u(k), the input at time point k, and depend on their immediately preceding values x(k − 1)2, x(k −1)3 and on the on parameters κ, ω, ϑ. Right: model definition. This figure has been adapted from Figures 1, 2 in Mathys et al. (2011).
Figure 3
Figure 3
The consequences of sensory uncertainty. Simulation of inference on a binary hidden state x1 (black dots) using a three-level HGF under low (π^u = 1000, top panel) and high (π^u = 10, bottom panel) sensory uncertainty. Trajectories were simulated using the same input and parameters (except π^u) in both cases: μ(0)2 = μ(0)3 = 0, σ(0)2 = σ(0)3 = 1, κ = 1, ω = −3, and ϑ = 0.7. Decisions were simulated using a unit-square sigmoid model with ζ = 8.
Figure 4
Figure 4
The unit square sigmoid (cf. Equations 43, 44). The parameter ζ can be interpreted as inverse response noise because the sigmoid approaches a step function as ζ approaches infinity.
Figure 5
Figure 5
Model inversion. Maximum-a-posteriori parameter estimates are μ(0)2 = 0.87, σ(0)2 = 1.20, μ(0)3 = −0.65, σ(0)3 = 0.88, κ = 1.32, ω = −0.71, and ϑ = 0.0023. These parameter values correspond to the following trajectories: (A) Posterior expectation μ3 of log-volatility x3. (B) Precision weight ψ3=defπ^2π3 which modulates the impact of prediction error δ2 on log-volatility updates. (C) Volatility prediction error δ2. (D) Posterior expectation μ2 of tendency μ2. (E) Precision weight ψ2=defπ2−1 (in green) which modulates the impact of input prediction error δ1 on μ2. Since μ2 is in logit space, the function of σ2 as a dynamic learning rate is more easily visible after transformation to x1-space. This results in the red line labeled q(ψ2)=defs′(μ2)ψ_2 ). (F) Prediction error δ1 about input u. (In Iglesias et al., , Figures S1 and S2, δ1 is defined as an outcome prediction error, which corresponds to the absolute value of δ1 as defined here). (G) Black: true probability of input 1. Red: posterior expectation of input u = 1, μ^1; this corresponds to a sigmoid transformation of μ2 in (E). Green: sensory input. Orange: subject's observed decisions.
Figure 6
Figure 6
One-armed bandit task. Participants were engaged in a simple decision-making task. Each trial consisted of four phases. (i) Cue phase. Two cards and their costs were displayed. (ii) Decision phase. Once the subject had made a decision, the chosen card was highlighted. (iii) Outcome phase. The outcome of a decision was displayed, and added to the score bar only if the chosen card was rewarded. (iv) Inter-trial interval (ITI). The screen only showed the score bar, until the beginning of the next trial. Our experimental paradigm consisted of a number of phases with different reward structures. Different phase lengths induced both a phase of low volatility (trials 1 through 90) and a phase of high volatility (trials 91 through 160).
Figure 7
Figure 7
Estimation of coupling κ by four methods at different noise levels ζ. A range of κ from 0.5 to 3.5 was chosen based on the range of estimates observed in the analysis of experimental data. Decision noise levels were chosen in a range from very high (0.5) to very low (24). The remaining model parameters were held constant (ω = −4, ϑ = 0.0025). For each point of the resulting two-dimensional grid, 1000 task runs with 320 decisions each were simulated. Given the fixed sequence of inputs and simulated sequence of decisions, we then attempted to recover the model parameters, including κ and ζ, by four estimation methods: (1) the function Nelder-Mead simplex algorithm (NMSA), (2) Bayesian global optimization based on Gaussian processes (GPGO), (4) variational Bayes (VB), and Markov chain Monte Carlo sampling (MCMC). The figure shows boxplots of the distributions of the maximum-a-posteriori (MAP) point estimates for the four methods at each grid point. Boxplots consist of boxes spanning the range from the 25th to the 75th percentile, circles at the median, and whiskers spanning the rest of the estimate range. Horizontal shifts within ζ levels are for readability. Black bars indicate ground truth.
Figure 8
Figure 8
Estimation of noise level ζ at different levels of coupling κ. ζ is estimated and displayed here at the logarithmic scale because it has a natural lower bound at 0. See Figure 7 for key to legend. The figure shows boxplots of the distributions of the maximum-a-posteriori (MAP) point estimates for the four methods at each point of the simulation grid. Horizontal shifts within κ levels are for readability. Black bars indicate ground truth.
Figure 9
Figure 9
Quantitative assessment of parameter estimation. (A) Root mean squared error of MAP estimates by noise level ζ for all four estimation methods (see Figure 7 for key to legends). (A1) Estimates for κ improve with decreasing noise and do not exhibit substantially significant differences between methods although NMSA is somewhat better at very high noise. (A2) As in Figure 8, estimates for ζ were assessed at the logarithmic scale. (B) Confidence of VB and MCMC. (B1) Both methods are realistically confident about their inference on κ across noise levels, with a slight tendency toward overconfidence with higher noise. (B2) This tendency is more pronounced with estimates of ζ.
Figure 10
Figure 10
Posterior mean update equation. Updates are precision-weighted prediction errors. This general feature of Bayesian updating is concretized by the HGF for volatility predictions in a hierarchical setting.

References

    1. Adams R. A., Stephan K. E., Brown H. R., Friston K. J. (2013). The computational anatomy of psychosis. Front. Schizophr. 4:47. 10.3389/fpsyt.2013.00047
    1. Behrens T. E. J., Hunt L. T., Woolrich M. W., Rushworth M. F. S. (2008). Associative learning of social value. Nature 456, 245–249. 10.1038/nature07538
    1. Behrens T. E. J., Woolrich M. W., Walton M. E., Rushworth M. F. S. (2007). Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221. 10.1038/nn1954
    1. Bishop C. M. (2006). Pattern Recognition and Machine Learning. New York, NY: Springer.
    1. Bland A. R., Schaefer A. (2012). Different varieties of uncertainty in human decision-making. Front. Neurosci. 6:85. 10.3389/fnins.2012.00085
    1. Daunizeau J., Adam V., Rigoux L. (2014). VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Comput. Biol. 10:e1003441. 10.1371/journal.pcbi.1003441
    1. Daunizeau J., den Ouden H. E. M., Pessiglione M., Kiebel S. J., Friston K. J., Stephan K. E. (2010a). Observing the Observer (II): deciding when to decide. PLoS ONE 5:e15555. 10.1371/journal.pone.0015555
    1. Daunizeau J., den Ouden H. E. M., Pessiglione M., Kiebel S. J., Stephan K. E., Friston K. J. (2010b). Observing the Observer (I): meta-bayesian models of learning and decision-making. PLoS ONE 5:e15554. 10.1371/journal.pone.0015554
    1. Daunizeau J., Friston K. J., Kiebel S. J. (2009). Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models. Phys. Nonlinear Phenom. 238, 2089–2118. 10.1016/j.physd.2009.08.002
    1. Daw N. D., Doya K. (2006). The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16, 199–204. 10.1016/j.conb.2006.03.006
    1. Daw N. D., O'Doherty J. P., Dayan P., Seymour B., Dolan R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature 441, 876–879. 10.1038/nature04766
    1. Dayan P., Hinton G. E., Neal R. M., Zemel R. S. (1995). The helmholtz machine. Neural Comput. 7, 889–904. 10.1162/neco.1995.7.5.889
    1. Doya K., Ishii S., Pouget A., Rao R. P. N. (2011). Bayesian Brain: Probabilistic Approaches to Neural Coding. Cambridge, MA: Mit Press.
    1. Evans L. C. (2010). Partial Differential Equations. Providece, RI: American Mathematical Society.
    1. Faisal A. A., Selen L. P. J., Wolpert D. M. (2008). Noise in the nervous system. Nat. Rev. Neurosci. 9, 292–303. 10.1038/nrn2258
    1. Friston K. (2009). The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. 13, 293–301. 10.1016/j.tics.2009.04.005
    1. Friston K. J., Dolan R. J. (2010). Computational and dynamic models in neuroimaging. Neuroimage 52, 752–765. 10.1016/j.neuroimage.2009.12.068
    1. Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. (2007). Variational free energy and the Laplace approximation. Neuroimage 34, 220–234. 10.1016/j.neuroimage.2006.08.035
    1. Friston K., Schwartenbeck P., FitzGerald T., Moutoussis M., Behrens T., Dolan R. J. (2013). The anatomy of choice: active inference and agency. Front. Hum. Neurosci. 7:598. 10.3389/fnhum.2013.00598
    1. Gelman A., Carlin J. B., Stern H. S., Rubin D. B. (2003). Bayesian Data Analysis. Boca Raton, FL: Chapman & Hall/CRC.
    1. Helmholtz H. (1860). Handbuch der Physiologischen Optik. English translation (1962): J. P. C. Southall. New York, NY: Dover.
    1. Iglesias S., Mathys C., Brodersen K. H., Kasper L., Piccirelli M., den Ouden H. E. M., et al. . (2013). Hierarchical prediction errors in midbrain and basal forebrain during sensory learning. Neuron 80, 519–530. 10.1016/j.neuron.2013.09.009
    1. Joffily M., Coricelli G. (2013). Emotional valence and the free-energy principle. PLoS Comput. Biol. 9:e1003094. 10.1371/journal.pcbi.1003094
    1. Kalman R. E. (1960). A new approach to linear filtering and prediciton problems. J. Basic Eng. 82, 35–45.
    1. Knill D. C., Pouget A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719. 10.1016/j.tins.2004.10.007
    1. Körding K. P., Wolpert D. M. (2006). Bayesian decision theory in sensorimotor control. Trends Cogn. Sci. 10, 319–326. 10.1016/j.tics.2006.05.003
    1. Lomakina E., Vezhnevets A., Mathys C., Brodersen K. H., Stephan K. E., Buhmann J. (2012). Bayesian Global Optimization for Model-Based Neuroimaging. HBM E-Poster. Available online at:
    1. Macready W. G., Wolpert D. H. I. (1998). Bandit problems and the exploration/exploitation tradeoff. Evol. Comput. IEEE Trans. 2, 2–22 10.1109/4235.728210
    1. Mathys C., Daunizeau J., Friston K. J., Stephan K. E. (2011). A Bayesian foundation for individual learning under uncertainty. Front. Hum. Neurosci. 5:39. 10.3389/fnhum.2011.00039
    1. Nassar M. R., Wilson R. C., Heasly B., Gold J. I. (2010). An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–12378. 10.1523/JNEUROSCI.0822-10.2010
    1. Nelder J. A., Mead R. (1965). A simplex method for function minimization. Comput. J. 7, 308–313 10.1093/comjnl/7.4.308
    1. O'Doherty J. P., Hampton A., Kim H. (2007). Model-based fMRI and its application to reward learning and decision making. Ann. N.Y. Acad. Sci. 1104, 35–53. 10.1196/annals.1390.022
    1. Paliwal S., Petzschner F., Schmitz A. K., Tittgemeyer M., Stephan K. E. (2014). A model-based analysis of impulsivity using a slot-machine gambling paradigm. Front. Hum. Neurosci. 8:428. 10.3389/fnhum.2014.00428
    1. Payzan-LeNestour E., Bossaerts P. (2011). Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings. PLoS Comput. Biol. 7:e1001048. 10.1371/journal.pcbi.1001048
    1. Rasmussen C. E., Williams C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.
    1. Rescorla R. A., Wagner A. R. (1972). A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement, in Classical Conditioning II: Current Research and Theory, eds Black A. H., Prokasy W. F. (New York, NY: Appleton-Century-Crofts; ), 64–99.
    1. Schwartenbeck P., FitzGerald T., Dolan R. J., Friston K. (2013). Exploration, novelty, surprise, and free energy minimization. Front. Cogn. Sci. 4:710. 10.3389/fpsyg.2013.00710
    1. Stephan K. E., Tittgemeyer M., Knösche T. R., Moran R. J., Friston K. J. (2009). Tractography-based priors for dynamic causal models. Neuroimage 47, 1628–1638. 10.1016/j.neuroimage.2009.05.096
    1. Sutton R. (1992). Gain adaptation beats least squares?, in In Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems (New Haven, CT: ), 161–166.
    1. Sutton R. S., Barto A. G. (1998). Reinforcement Learning: an Introduction. Cambridge, MA: MIT Press.
    1. Vossel S., Mathys C., Daunizeau J., Bauer M., Driver J., Friston K. J., et al. . (2013). Spatial attention, precision, and Bayesian inference: a study of saccadic response speed. Cereb. Cortex 24, 1436–1450. 10.1093/cercor/bhs418
    1. Wilson R. C., Nassar M. R., Gold J. I. (2013). A mixture of delta-rules approximation to Bayesian inference in change-point problems. PLoS Comput Biol 9:e1003150. 10.1371/journal.pcbi.1003150
    1. Yu A., Dayan P. (2003). Expected and Unexpected Uncertainty: ACh and NE in the Neocortex. Available online at: (Accessed: July 29, 2013).
    1. Yu A. J., Dayan P. (2005). Uncertainty, neuromodulation, and attention. Neuron 46, 681–692. 10.1016/j.neuron.2005.04.026

Source: PubMed

3
Tilaa