A role for dopamine in temporal decision making and reward maximization in parkinsonism

Ahmed A Moustafa, Michael X Cohen, Scott J Sherman, Michael J Frank, Ahmed A Moustafa, Michael X Cohen, Scott J Sherman, Michael J Frank

Abstract

Converging evidence implicates striatal dopamine (DA) in reinforcement learning, such that DA increases enhance "Go learning" to pursue actions with rewarding outcomes, whereas DA decreases enhance "NoGo learning" to avoid non-rewarding actions. Here we test whether these effects apply to the response time domain. We employ a novel paradigm which requires the adjustment of response times to a single response. Reward probability varies as a function of response time, whereas reward magnitude changes in the opposite direction. In the control condition, these factors exactly cancel, such that the expected value across time is constant (CEV). In two other conditions, expected value increases (IEV) or decreases (DEV), such that reward maximization requires either speeding up (Go learning) or slowing down (NoGo learning) relative to the CEV condition. We tested patients with Parkinson's disease (depleted striatal DA levels) on and off dopaminergic medication, compared with age-matched controls. While medicated, patients were better at speeding up in the DEV relative to CEV conditions. Conversely, nonmedicated patients were better at slowing down to maximize reward in the IEV condition. These effects of DA manipulation on cumulative Go/NoGo response time adaptation were captured with our a priori computational model of the basal ganglia, previously applied only to forced-choice tasks. There were also robust trial-to-trial changes in response time, but these single trial adaptations were not affected by disease or medication and are posited to rely on extrastriatal, possibly prefrontal, structures.

Figures

Figure 1.
Figure 1.
Task conditions: DEV, CEV, IEV, and CEVR. The x-axis in all plots corresponds to the time after onset of the clock stimulus at which the response is made. The functions are designed such that the expected value in the beginning in DEV is approximately equal to that at the end in IEV so that if optimal, subjects should obtain the same average reward in both IEV and DEV. a, Example clock-face stimulus; b, probability of reward occurring as a function of response time; c, reward magnitude (contingent on a); d, expected value across trials for each time point. Note that CEV and CEVR have the same EV, so the black line represents EV for both conditions.
Figure 2.
Figure 2.
a, Functional architecture of the model of the basal ganglia. The direct (“Go”) pathway disinhibits the thalamus via the interior segment of the GPi and facilitates the execution of an action represented in the cortex. The indirect (NoGo) pathway has an opposing effect of inhibiting the thalamus and suppressing the execution of the action. These pathways are modulated by the activity of the substantia nigra pars compacta (SNc) that has dopaminergic projections to the striatum. Go neurons express excitatory D1 receptors whereas NoGo neurons express inhibitory D2 receptors. b, The Frank (2006) computational model of the BG. Cylinders represent neurons, height and color represent normalized activity. The input neurons project directly to the pre-SMA in which a response is executed via excitatory projections to the output (M1) neurons. A given cortical response is facilitated by bottom-up activity from thalamus, which is only possible once a Go signal from striatum disinhibits the thalamus. The left half of the striatum are the Go neurons, the right half are the NoGo neurons, each with separate columns for responses R1 and R2. The relative difference between summed Go and NoGo population activity for a particular response determines the probability and speed at which that response is selected. Dopaminergic projections from the substantia nigra pars compacta (SNc) modulate Go and NoGo activity by exciting the Go neurons (D1) and inhibiting the NoGo neurons (D2) in the striatum, and also drive learning during phasic DA bursts and dips. Connections with the subthalamic nucleus (STN) are included here for consistency, and modulate the overall decision threshold Frank (2006), but are not relevant for the current study.
Figure 3.
Figure 3.
a–c, Response times as a function of trial number, smoothed with a 10 trial kernel, in healthy seniors (a), patients off medication (b), and patients on medication (c).
Figure 4.
Figure 4.
a, Relative within-subject biases to speed RTs in DEV compared with CEV (Go learning) and to slow RTs in IEV compared with CEV (NoGo learning). Values represent mean (SE) in the last block of 12 trials in each condition. b, Similar pattern of results from neural network model of the basal ganglia. c, d, Raw response times are shown for each condition in participants in this study (c) the neural model (d) (see Materials and Methods, Model methods for the current study, for quantification of model RTs).
Figure 5.
Figure 5.
Temporal difference model results. Control task 1, Control task showing that the TD implementation can successfully speed responses for stimuli that have a greater probability of being followed by a reward with constant delay (i.e., showing Pavlovian to instrumental transfer) across a range of parameters. Control task 2, Similar results for increasing expected value. Experimental task, The same TD model fails to differentially modulate RTs across conditions within our experimental task. See Results, Temporal difference simulation results.
Figure 6.
Figure 6.
Relative within-subject biases to prefer high probability over high magnitude, controlling for equal expected value (CEVR − CEV). Senior controls and patients off medication showed risk aversion, whereas those on medication did not. Values represent mean (SE) in the last block of trials in each condition.
Figure 7.
Figure 7.
a–d, Trial-to-trial adjustments in RT from previous to current trial, conditionalized according to whether the last trial was rewarded (Win) or not in senior controls (a), PD patients off medication (b), and patients on medication (c). d, Trial-to-trial adjustments across all conditions after faster and slower than average responses. Note difference in scale. Values represent mean (SE).

Source: PubMed

3
Abonnere