Generalization of value in reinforcement learning by humans

G Elliott Wimmer, Nathaniel D Daw, Daphna Shohamy, G Elliott Wimmer, Nathaniel D Daw, Daphna Shohamy

Abstract

Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus-reward or stimulus-response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus-stimulus relations, typically dissociated from the role of the striatum in stimulus-response learning. Here, we used functional magnetic resonance imaging and computational model-based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level-dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants' choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus.

© 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

Figures

Figure 1
Figure 1
Design of the reward equivalence paradigm. a) On each trial, participants chose one of four face options. After a delay, the outcome ($0.25 or $0.00) was revealed. In colored brackets, one example of option pairing is indicated. b) Drifting reward probability distribution defining the reward equivalence for one example pairing (left). Trial-by-trial reinforcement learning variables for 50 trials from an example participant: fMRI model regressors for prediction error (black) and prediction error difference due to generalization (red), and an illustration of a full generalization model prediction error (grey) (right).
Figure 1
Figure 1
Design of the reward equivalence paradigm. a) On each trial, participants chose one of four face options. After a delay, the outcome ($0.25 or $0.00) was revealed. In colored brackets, one example of option pairing is indicated. b) Drifting reward probability distribution defining the reward equivalence for one example pairing (left). Trial-by-trial reinforcement learning variables for 50 trials from an example participant: fMRI model regressors for prediction error (black) and prediction error difference due to generalization (red), and an illustration of a full generalization model prediction error (grey) (right).
Figure 2
Figure 2
Ventral striatum BOLD signals are best described by a model that incorporates generalization knowledge. a) Prediction error, left. Prediction error difference due to value generalization, middle. b) Conjunction of prediction error and prediction error difference due to generalization (p<.05 svc all p unc. for visualization>

Figure 3

Hippocampal activation correlated with chosen…

Figure 3

Hippocampal activation correlated with chosen value during the choice period of the reward…

Figure 3
Hippocampal activation correlated with chosen value during the choice period of the reward equivalence task (p<.05 svc p unc. for visualization>

Figure 4

Psychophysiological interaction (PPI) between task…

Figure 4

Psychophysiological interaction (PPI) between task and ventral striatal activity is predicted by the…

Figure 4
Psychophysiological interaction (PPI) between task and ventral striatal activity is predicted by the degree that a participant’s choice behavior is better fit by the generalization RL model. (n=20; FWE SVC in the MTL at p
Similar articles
Cited by
Publication types
MeSH terms
Full text links [x]
[x]
Cite
Copy Download .nbib
Format: AMA APA MLA NLM

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Follow NCBI
Figure 3
Figure 3
Hippocampal activation correlated with chosen value during the choice period of the reward equivalence task (p<.05 svc p unc. for visualization>

Figure 4

Psychophysiological interaction (PPI) between task…

Figure 4

Psychophysiological interaction (PPI) between task and ventral striatal activity is predicted by the…

Figure 4
Psychophysiological interaction (PPI) between task and ventral striatal activity is predicted by the degree that a participant’s choice behavior is better fit by the generalization RL model. (n=20; FWE SVC in the MTL at p
Similar articles
Cited by
Publication types
MeSH terms
Full text links [x]
[x]
Cite
Copy Download .nbib
Format: AMA APA MLA NLM
Figure 4
Figure 4
Psychophysiological interaction (PPI) between task and ventral striatal activity is predicted by the degree that a participant’s choice behavior is better fit by the generalization RL model. (n=20; FWE SVC in the MTL at p

Source: PubMed

3
Sottoscrivi