Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices

Gerhard Jocham, Tilmann A Klein, Markus Ullsperger, Gerhard Jocham, Tilmann A Klein, Markus Ullsperger

Abstract

A large body of evidence exists on the role of dopamine in reinforcement learning. Less is known about how dopamine shapes the relative impact of positive and negative outcomes to guide value-based choices. We combined administration of the dopamine D(2) receptor antagonist amisulpride with functional magnetic resonance imaging in healthy human volunteers. Amisulpride did not affect initial reinforcement learning. However, in a later transfer phase that involved novel choice situations requiring decisions between two symbols based on their previously learned values, amisulpride improved participants' ability to select the better of two highly rewarding options, while it had no effect on choices between two very poor options. During the learning phase, activity in the striatum encoded a reward prediction error. In the transfer phase, in the absence of any outcome, ventromedial prefrontal cortex (vmPFC) continually tracked the learned value of the available options on each trial. Both striatal prediction error coding and tracking of learned value in the vmPFC were predictive of subjects' choice performance in the transfer phase, and both were enhanced under amisulpride. These findings show that dopamine-dependent mechanisms enhance reinforcement learning signals in the striatum and sharpen representations of associative values in prefrontal cortex that are used to guide reinforcement-based decisions.

Figures

Figure 1.
Figure 1.
Sequence of stimulus events within a trial of the reinforcement learning and choice task. A, Following selection of one of the two stimuli, the choice was visualized to the subject by a white frame (presented for 300 ms) around the corresponding symbol. This was immediately followed by positive or negative feedback, according to the task schedule. B, In the subsequent choice phase, symbols were rearranged to yield 12 novel combinations of symbols. In addition, the three pairs from the learning phase were also presented. Trials were identical to those from the learning phase, with the exception that no outcome was presented. Of particular interest in this phase were so-called win–win trials (highlighted in blue) and lose–lose trials (highlighted in red) in which two symbols associated with a very high or very low probability of reinforcement, respectively, were combined.
Figure 2.
Figure 2.
A, Performance on probe trials of the transfer phase. The three symbol pairs previously presented in the learning phase were also administered in the transfer phase, but in the absence of an outcome. This served as a measure of how well the initial discrimination had been learned. B, Action values for the six symbols at the end of the learning phase were estimated by the reinforcement learning algorithm.
Figure 3.
Figure 3.
Percentage of correct choices of the better symbol on win–win (AC, AE, and CE) and lose–lose trials (BD, BF, and DF) in the transfer phase. *p < 0.05, paired t-test against placebo. AMI, Amisulpride; PLA, placebo.
Figure 4.
Figure 4.
Signal change related to the receipt of a reward (top) and to reward prediction errors (bottom). Signal change in the striatum and ventromedial prefrontal cortex was found in both the placebo (PLA; left) and amisulpride condition (AMI; middle). Amisulpride increased both reward-related signal change in the ventromedial prefrontal cortex and prediction error-related signal change in the striatum compared with placebo (right). Images are thresholded at z > 3.09.
Figure 5.
Figure 5.
Signal change related to choose A versus avoid B trials (top) and to the learned value of the symbols on each trials (bottom). Amisulpride (AMI) increased the effect of choose A versus avoid B trials (right, top) and the activity related to the symbols' learned value (right, bottom) in the ventromedial prefrontal cortex. Images are thresholded at z > 2.3 for display purposes. The green crosshairs are positioned at x = 0, y = 48 for comparison with the effect of amisulpride on reward processing shown in the upper right-hand panel of Figure 4. PLA, Placebo.
Figure 6.
Figure 6.
Percentage of signal change in response to rewards and nonrewards (left) and to choose A and avoid B trials (right) in the ventromedial prefrontal cortex during the learning and transfer phases, respectively. Signal change was extracted from the peak coordinate of the difference between amisulpride and placebo in the respective contrast. The analysis shows that the drug-induced difference in reward responses is primarily due to a stronger signal decrease to nonrewarding outcomes in the amisulpride condition. The increased response in the choose A versus avoid B contrast is driven by a nonsignificant enhancement of both the signal increase to choose A trials and the signal decrease to avoid B trials.
Figure 7.
Figure 7.
Signal change related to reward prediction errors in the striatum during the learning phase (left) and to the symbols' learned values in the ventromedial prefrontal cortex during the transfer phase (right) correlated with correct choices on win–win trials. White, Placebo; black, amisulpride.

Source: PubMed

3
订阅