Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems

Alon Boaz Baram, Timothy Howard Muller, Hamed Nili, Mona Maria Garvert, Timothy Edward John Behrens, Alon Boaz Baram, Timothy Howard Muller, Hamed Nili, Mona Maria Garvert, Timothy Edward John Behrens

Abstract

Knowledge of the structure of a problem, such as relationships between stimuli, enables rapid learning and flexible inference. Humans and other animals can abstract this structural knowledge and generalize it to solve new problems. For example, in spatial reasoning, shortest-path inferences are immediate in new environments. Spatial structural transfer is mediated by cells in entorhinal and (in humans) medial prefrontal cortices, which maintain their co-activation structure across different environments and behavioral states. Here, using fMRI, we show that entorhinal and ventromedial prefrontal cortex (vmPFC) representations perform a much broader role in generalizing the structure of problems. We introduce a task-remapping paradigm, where subjects solve multiple reinforcement learning (RL) problems differing in structural or sensory properties. We show that, as with space, entorhinal representations are preserved across different RL problems only if task structure is preserved. In vmPFC and ventral striatum, representations of prediction error also depend on task structure.

Keywords: RL; cognitive map; entorhinal cortex; generalization; grid cells; hippocampal formation; reinforcement learning; spatial cognition; structure learning; vmPFC.

Conflict of interest statement

Declaration of interests The authors declare no conflicting interests.

Copyright © 2020 The Author(s). Published by Elsevier Inc. All rights reserved.

Figures

Figure 1
Figure 1
Task design (A) Possible progressions of a single trial. (B) Experimental design and neural predictions for structure-encoding brain regions: 2×2 factorial design of stimuli set × relational structure. (C) Example of the reward schedule for one subject in the four block-types. Solid gray lines and dashed black line are the probabilities of a good outcome for the related stimuli and the control stimulus, respectively. Xs mark the stimuli (color) and actual binary outcomes (y axis: 0.1 and 0.9 are bad and good outcomes, respectively) in each trial. For visualization purposes, the two 30 trials long blocks of each of the four block-types were concatenated. While related stimuli in +Corr blocks (right panels) are associated with exactly the same probability, their corresponding light and dark gray lines are slightly offset for visualization purposes.
Figure 2
Figure 2
Subjects use the correlation structure correctly (A) Negative log likelihoods for STRUCT (left) and NAÏVE (right) models (same scale for both matrices). Pink elements: STRUCT models, cross-validated within-structure. Green elements: STRUCT models, cross-validated across structures. Grey elements: NAÏVE models, trained and tested on the same data. (B) Histograms of the estimated outcome probabilities for trials where subjects accepted (blue) or rejected (orange). Left: STRUCT models trained on data with the same structure but different stimuli set (pink elements in A). Right: NAÏVE models, trained and tested on the same data (gray elements in A). Histograms only include trials where the models make different predictions. (C) Fitted cross-terms for pairs of stimuli in all −Corr (top) and +Corr (bottom) blocks. Red central line is the median, the box edges are the 25th and 75th percentiles, the whiskers extend to the most extreme datapoints that are not considered outliers, and the outliers are plotted as red circles. (D) Effect of the chosen action value estimates from STRUCT model, in a GLM where it competes with estimates from NAÏVE model (replication of Hampton et al., [2006]).
Figure 3
Figure 3
The relational structure of the task is represented in the entorhinal cortex Top: relational structure effect, peaking in EC. Bottom: stimulus identity effect, peaking in LOC. (A) Model RDMs. Black elements should be similar, white elements should be dissimilar. Pairs of stimuli with purple and orange rectangles around them are −Corr and +Corr, respectively. (B) Visualization of the data RDM from peak vertex of the effect, marked with an arrow in (D). (C) Visualization of the paired mean difference effects between same (black RDM elements in A) and different (white elements in A) pairs of conditions from the peak vertex of the effects. Both groups are plotted on the left axes as a slope-graph: each paired set of observations for one subject is connected by a line. The paired mean difference is plotted on a floating axis on the right, aligned to the mean of the same group. The mean difference is depicted by a dashed line (consequently aligned to the mean of the diff group). Error bars indicate the 95% confidence interval obtained by a bootstrap procedure. (D) Whole surface results, right hemisphere. Clusters surviving FWE correction across the whole surface at a cluster forming threshold of p 

Figure 4

Prediction error signals in vmPFC…

Figure 4

Prediction error signals in vmPFC and ventral striatum depend on the current relational…

Figure 4
Prediction error signals in vmPFC and ventral striatum depend on the current relational structure of the task (A) Visualization of whole-surface results of the multivariate prediction error × relational structure interaction effect, medial left hemisphere. (B) Interaction effect at the left hemisphere vmPFC peak of the univariate prediction error effect (MNI: [−4,44,−20]). (C) Interaction effect at the right hemisphere vmPFC peak of the univariate prediction error effect (MNI: [8,44,−11]). (D) Interaction effect at the ventral striatum peak univariate prediction error effect (MNI: [−10,8,−12]). Brain images in the insets of (B), (C), and (D) show the univariate prediction error effect (projected on the surface in B and C). Legend for (B), (C), and (D) is the same as in Figure 3C.
Figure 4
Figure 4
Prediction error signals in vmPFC and ventral striatum depend on the current relational structure of the task (A) Visualization of whole-surface results of the multivariate prediction error × relational structure interaction effect, medial left hemisphere. (B) Interaction effect at the left hemisphere vmPFC peak of the univariate prediction error effect (MNI: [−4,44,−20]). (C) Interaction effect at the right hemisphere vmPFC peak of the univariate prediction error effect (MNI: [8,44,−11]). (D) Interaction effect at the ventral striatum peak univariate prediction error effect (MNI: [−10,8,−12]). Brain images in the insets of (B), (C), and (D) show the univariate prediction error effect (projected on the surface in B and C). Legend for (B), (C), and (D) is the same as in Figure 3C.

References

    1. Baldassano C., Hasson U., Norman K.A. Representation of Real-World Event Schemas during Narrative Perception. J. Neurosci. 2018;38:9689–9699.
    1. Banino A., Barry C., Uria B., Blundell C., Lillicrap T., Mirowski P., Pritzel A., Chadwick M.J., Degris T., Modayil J. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557:429–433.
    1. Bao X., Gjorgieva E., Shanahan L.K., Howard J.D., Kahnt T., Gottfried J.A. Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron. 2019;102:1066–1075.e5.
    1. Barron H.C., Dolan R.J., Behrens T.E.J. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 2013;16:1492–1498.
    1. Barry C., Ginzberg L.L., O’Keefe J., Burgess N. Grid cell firing patterns signal environmental novelty by expansion. Proc. Natl. Acad. Sci. USA. 2012;109:17687–17692.
    1. Behrens T.E.J., Muller T.H., Whittington J.C.R., Mark S., Baram A.B., Stachenfeld K.L., Kurth-Nelson Z. Organizing Knowledge for Flexible Behavior. Neuron; 2018. What Is a Cognitive Map?
    1. Boccara C.N., Nardin M., Stella F., O’Neill J., Csicsvari J. The entorhinal cognitive map is attracted to goals. Science. 2019;363:1443–1447.
    1. Bostock E., Muller R.U., Kubie J.L. Experience-dependent modifications of hippocampal place cell firing. Hippocampus. 1991;1:193–205.
    1. Bowman C.R., Zeithamova D. Abstract memory representations in the ventromedial prefrontal cortex and hippocampus support concept generalization. J. Neurosci. 2018;38:2605–2614.
    1. Bush D., Barry C., Manson D., Burgess N. Using Grid Cells for Navigation. Neuron. 2015;87:507–520.
    1. Butler W.N., Hardcastle K., Giocomo L.M. Remembered reward locations restructure entorhinal spatial maps. Science. 2019;363:1447–1452.
    1. Cohen N.J., Eichenbaum H. MIT Press; Cambridge: 1993. Memory, Amnesia, and the Hippocampal System.
    1. Constantinescu A.O., O’Reilly J.X., Behrens T.E.J. Organizing conceptual knowledge in humans with a gridlike code. Science. 2016;352:1464–1468.
    1. Dale A.M., Fischl B., Sereno M.I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage. 1999;9:179–194.
    1. Delgado M.R., Nystrom L.E., Fissell C., Noll D.C., Fiez J.A. Tracking the hemodynamic responses to reward and punishment in the striatum. J. Neurophysiol. 2000;84:3072–3077.
    1. Desikan R.S., Ségonne F., Fischl B., Quinn B.T., Dickerson B.C., Blacker D., Buckner R.L., Dale A.M., Maguire R.P., Hyman B.T. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980.
    1. Doeller C.F., Barry C., Burgess N. Evidence for grid cells in a human memory network. Nature. 2010;463:657–661.
    1. Dordek Y., Soudry D., Meir R., Derdikman D. Extracting grid cell characteristics from place cell inputs using non-negative principal component analysis. eLife. 2016;5:e10094.
    1. Fischl B. FreeSurfer. Neuroimage. 2012;62:774–781.
    1. Fischl B., Sereno M.I., Dale A.M. Cortical surface-based analysis. II: Inflation, flattening, and a surface-based coordinate system. Neuroimage. 1999;9:195–207.
    1. Fischl B., Liu A., Dale A.M. Automated manifold surgery: constructing geometrically accurate and topologically correct models of the human cerebral cortex. IEEE Trans. Med. Imaging. 2001;20:70–80.
    1. Fischl B., Salat D.H., Busa E., Albert M., Dieterich M., Haselgrove C., van der Kouwe A., Killiany R., Kennedy D., Klaveness S. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33:341–355.
    1. Fischl B., Stevens A.A., Rajendran N., Yeo B.T.T., Greve D.N., Van Leemput K., Polimeni J.R., Kakunoori S., Buckner R.L., Pacheco J. Predicting the location of entorhinal cortex from MRI. Neuroimage. 2009;47:8–17.
    1. Fyhn M., Hafting T., Treves A., Moser M.-B., Moser E.I. Hippocampal remapping and grid realignment in entorhinal cortex. Nature. 2007;446:190–194.
    1. Gardner R.J., Lu L., Wernle T., Moser M.-B., Moser E.I. Correlation structure of grid cells is preserved during sleep. Nat. Neurosci. 2019;22:598–608.
    1. Garvert M.M., Dolan R.J., Behrens T.E.J. A map of abstract relational knowledge in the human hippocampal-entorhinal cortex. eLife. 2017;6:1–20.
    1. Gershman S.J., Niv Y. Learning latent structure: carving nature at its joints. Curr. Opin. Neurobiol. 2010;20:251–256.
    1. Gershman S.J., Norman K.A., Niv Y. Discovering latent causes in reinforcement learning. Curr. Opin. Behav. Sci. 2015;5:43–50.
    1. Gil M., Ancau M., Schlesiger M.I., Neitz A., Allen K., De Marco R.J., Monyer H. Impaired path integration in mice with disrupted grid cell firing. Nat. Neurosci. 2018;21:81–91.
    1. Gilboa A., Marlatte H. Neurobiology of Schemas and Schema-Mediated Memory. Trends Cogn. Sci. 2017;21:618–631.
    1. Hafting T., Fyhn M., Molden S., Moser M.-B., Moser E.I. Microstructure of a spatial map in the entorhinal cortex. Nature. 2005;436:801–806.
    1. Hampton A.N., Bossaerts P., O’Doherty J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 2006;26:8360–8367.
    1. Harlow H.F. The formation of learning sets. Psychol. Rev. 1949;56:51–65.
    1. Ho J., Tumkaya T., Aryal S., Choi H., Claridge-Chang A. Moving beyond P values: data analysis with estimation graphics. Nat. Methods. 2019;16:565–566.
    1. Høydal Ø.A., Skytøen E.R., Andersson S.O., Moser M.B., Moser E.I. Object-vector coding in the medial entorhinal cortex. Nature. 2019;568:400–404.
    1. Jacobs J., Weidemann C.T., Miller J.F., Solway A., Burke J.F., Wei X.-X.X., Suthana N., Sperling M.R., Sharan A.D., Fried I., Kahana M.J. Direct recordings of grid-like neuronal activity in human spatial navigation. Nat. Neurosci. 2013;16:1188–1190.
    1. Jenkinson M., Smith S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 2001;5:143–156.
    1. Jenkinson M., Bannister P., Brady M., Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage. 2002;17:825–841.
    1. Jenkinson M., Beckmann C.F., Behrens T.E.J., Woolrich M.W., Smith S.M. FSL. Neuroimage. 2012;62:782–790.
    1. Jocham G., Brodersen K.H., Constantinescu A.O., Kahn M.C., Ianni A.M., Walton M.E., Rushworth M.F.S., Behrens T.E.J. Reward-Guided Learning with and without Causal Attribution. Neuron. 2016;90:177–190.
    1. Kleiner M., Brainard D., Pelli D., Ingling A., Murray R., Broussard C. What's new in psychtoolbox-3. Perception. 2007;36:1–16.
    1. Kriegeskorte N., Goebel R., Bandettini P. Information-based functional brain mapping. Proc. Natl. Acad. Sci. USA. 2006;103:3863–3868.
    1. Kriegeskorte N., Mur M., Bandettini P. Representational similarity analysis - connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2008;2:4.
    1. Lever C., Burton S., Jeewajee A., O’Keefe J., Burgess N. Boundary vector cells in the subiculum of the hippocampal formation. J. Neurosci. 2009;29:9771–9777.
    1. Mathis A., Stemmler M.B., Herz A.V.M. Probable nature of higher-dimensional symmetries underlying mammalian grid-cell activity patterns. eLife. 2015;4:1–29.
    1. Miller K.J., Botvinick M.M., Brody C.D. Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 2017;20:1269–1276.
    1. Morrissey M.D., Insel N., Takehara-Nishiuchi K. Generalizable knowledge outweighs incidental details in prefrontal ensemble code over time. eLife. 2017;6:1–20.
    1. Nakahara H., Itoh H., Kawagoe R., Takikawa Y., Hikosaka O. Dopamine neurons can represent context-dependent prediction error. Neuron. 2004;41:269–280.
    1. Nichols T.E., Holmes A.P. Nonparametric permutation tests for functional neuroimaging: a primer with examples. Hum. Brain Mapp. 2002;15:1–25.
    1. Nili H., Wingfield C., Walther A., Su L., Marslen-Wilson W., Kriegeskorte N. A toolbox for representational similarity analysis. PLoS Comput. Biol. 2014;10:e1003553.
    1. Niv Y. Learning task-state representations. Nat. Neurosci. 2019;22:1544–1553.
    1. O’Doherty J.P., Dayan P., Friston K., Critchley H., Dolan R.J. Temporal difference models and reward-related learning in the human brain. Neuron. 2003;38:329–337.
    1. O’Keefe J., Dostrovsky J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 1971;34:171–175.
    1. Pelletier G., Fellows L.K. A critical role for human ventromedial frontal lobe in value comparison of complex objects based on attribute configuration. J. Neurosci. 2019;39:4124–4132.
    1. Penny W.D., Friston K.J., Ashburner J.T., Kiebel S.J., Nichols T.E. Elsevier; 2007. Statistical parametric mapping: the analysis of functional brain images.
    1. Preston A.R., Eichenbaum H. Interplay of hippocampus and prefrontal cortex in memory. Curr. Biol. 2013;23:R764–R773.
    1. Ramnani N., Elliott R., Athwal B.S., Passingham R.E. Prediction error for free monetary reward in the human prefrontal cortex. Neuroimage. 2004;23:777–786.
    1. Rescorla R.A., Wagner A.R. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black A.H., Prokasy W.F., editors. Classical Conditioning II: Current Research and Theory. Appleton Century Crofts; 1972. pp. 64–99.
    1. Reuter M., Rosas H.D., Fischl B. Highly accurate inverse consistent registration: a robust approach. Neuroimage. 2010;53:1181–1196.
    1. Rudebeck P.H., Saunders R.C., Lundgren D.A., Murray E.A. Specialized Representations of Value in the Orbital and Ventrolateral Prefrontal Cortex: Desirability versus Availability of Outcomes. Neuron. 2017;95:1208–1220.e5.
    1. Rushworth M.F.S., Noonan M.P., Boorman E.D., Walton M.E., Behrens T.E.J. Frontal cortex and reward-guided learning and decision-making. Neuron. 2011;70:1054–1069.
    1. Savelli F., Yoganarasimha D., Knierim J.J. Influence of boundary removal on the spatial representations of the medial entorhinal cortex. Hippocampus. 2008;18:1270–1282.
    1. Schuck N.W., Cai M.B., Wilson R.C., Niv Y. Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron. 2016;91:1402–1412.
    1. Ségonne F., Dale A.M., Busa E., Glessner M., Salat D., Hahn H.K., Fischl B. A hybrid approach to the skull stripping problem in MRI. Neuroimage. 2004;22:1060–1075.
    1. Smith S.M. Fast robust automated brain extraction. Hum. Brain Mapp. 2002;17:143–155.
    1. Solstad T., Boccara C.N., Kropff E., Moser M.-B., Moser E.I. Representation of geometric borders in the entorhinal cortex. Science. 2008;322:1865–1868.
    1. Sommer T. The Emergence of Knowledge and How it Supports the Memory for Novel Related Information. Cereb. Cortex. 2017;27:1906–1921.
    1. Stachenfeld K.L., Botvinick M.M., Gershman S.J. The hippocampus as a predictive map. Nat. Neurosci. 2017;20:1643–1653.
    1. Tolman E.C. Cognitive maps in rats and men. Psychol. Rev. 1948;55:189–208.
    1. Trettel S.G., Trimper J.B., Hwaun E., Fiete I.R., Colgin L.L. Grid cell co-activity patterns during sleep reflect spatial overlap of grid fields during active behaviors. Nat. Neurosci. 2019;22:609–617.
    1. Tse D., Takeuchi T., Kakeyama M., Kajii Y., Okuno H., Tohyama C., Bito H., Morris R.G.M. Schema-dependent gene activation and memory encoding in neocortex. Science. 2011;333:891–895.
    1. Vaidya A.R., Jones H.M., Castillo J., Badre D. Neural representation of abstract task structure during generalization. bioRxiv. 2020 doi: 10.1101/2020.07.21.213009. Published online July 21, 2020.
    1. Vikbladh O.M., Meager M.R., King J., Blackmon K., Devinsky O., Shohamy D., Burgess N., Daw N.D. Hippocampal Contributions to Model-Based Planning and Spatial Memory. Neuron. 2019;102:683–693.e4.
    1. Walton M.E., Behrens T.E.J., Buckley M.J., Rudebeck P.H., Rushworth M.F.S. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron. 2010;65:927–939.
    1. Wang J.X., Kurth-Nelson Z., Kumaran D., Tirumala D., Soyer H., Leibo J.Z., Hassabis D., Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 2018;21:860–868.
    1. Wang F., Schoenbaum G., Kahnt T. Interactions between human orbitofrontal cortex and hippocampus support model-based inference. PLoS Biol. 2020;18:e3000578.
    1. Whittington J.C.R., Muller T.H., Mark S., Chen G., Barry C., Burgess N., Behrens T.E.J. The Tolman-Eichenbaum Machine: Unifying space and relational memory through generalization in the hippocampal formation. Cell. 2020;183:1249–1263.e23.
    1. Wilson R.C., Takahashi Y.K., Schoenbaum G., Niv Y. Orbitofrontal cortex as a cognitive map of task space. Neuron. 2014;81:267–279.
    1. Wimmer G.E., Daw N.D., Shohamy D. Generalization of value in reinforcement learning by humans. Eur. J. Neurosci. 2012;35:1092–1104.
    1. Winkler A.M., Ridgway G.R., Webster M.A., Smith S.M., Nichols T.E. Permutation inference for the general linear model. Neuroimage. 2014;92:381–397.
    1. Xie J., Padoa-Schioppa C. Neuronal remapping and circuit persistence in economic decisions. Nat. Neurosci. 2016;19:855–861.
    1. Yoon K., Buice M.A., Barry C., Hayman R., Burgess N., Fiete I.R. Specific evidence of low-dimensional continuous attractor dynamics in grid cells. Nat. Neurosci. 2013;16:1077–1084.
    1. Zeithamova D., Preston A.R. Flexible memories: differential roles for medial temporal lobe and prefrontal cortex in cross-episode binding. J. Neurosci. 2010;30:14676–14684.
    1. Zhou J., Gardner M.P.H., Stalnaker T.A., Ramus S.J., Wikenheiser A.M., Niv Y., Schoenbaum G. Rat Orbitofrontal Ensemble Activity Contains Multiplexed but Dissociable Representations of Value and Task Structure in an Odor Sequence Task. Curr. Biol. 2019;29:897–907.
    1. Zhou J., Montesinos-Cartagena M., Wikenheiser A.M., Gardner M.P.H., Niv Y., Schoenbaum G. Complementary Task Structure Representations in Hippocampus and Orbitofrontal Cortex during an Odor Sequence Task. Curr. Biol. 2019;29:3402–3409.e3.

Source: PubMed

3
Subscribe