Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates

Vasilisa Skvortsova, Stefano Palminteri, Mathias Pessiglione, Vasilisa Skvortsova, Stefano Palminteri, Mathias Pessiglione

Abstract

The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal anterior cingulate, and the posterior parietal cortex, correlated positively with expected and actual efforts. These findings suggest that the same computational rule is applied by distinct brain systems, depending on the choice dimension-cost or benefit-that has to be learned.

Keywords: computational modeling; effort; reinforcement learning; reward; ventromedial prefrontal cortex.

Copyright © 2014 the authors 0270-6474/14/3415621-10$15.00/0.

Figures

Figure 1.
Figure 1.
Behavioral task and results. A, Trial structure. Each trial started with a fixation cross followed by one of four abstract visual cues. The subject then had to make a choice by slightly squeezing the left or right hand grip. Each choice was associated with two outcomes: a monetary reward and a physical effort. Rewards were represented by a coin (10 or 20¢) that the subject received after exerting the required amount of effort, indicated by the height of the horizontal bar in the thermometer. The low and high bars corresponded respectively to 20% and 80% of a subject's maximal force. Effort could only start once the command GO! appeared on the screen. The subject had to squeeze the handgrip until the mercury reached the horizontal bar. In the illustrated example, the subject made a left-hand choice and produced an 80% force to win 20¢. The last screen informed the subject about the gain added to cumulative payoff. B, Probabilistic contingencies. There were four different contingency sets cued by four different symbols in the task. With cues A and B, reward probabilities (orange bars) differed between left and right (75%/25% and 25%/75%, respectively, chance of big reward), while effort probabilities (blue bars) were identical (100%/100% and 0%/0%, respectively, chance of big effort). The opposite was true for cues C and D: left and right options differed in effort probability (75%/25% and 25%/75%, respectively) but not in reward probability (100%/100% and 0%/0%, respectively). The illustration only applies to one task session. Contingencies were fully counterbalanced across the four sessions. C, Learning curves. Circles represent, trial by trial, the percentage of correct responses averaged across hands, sessions, and subjects for reward learning (left, cues A and B) and effort learning (right, cues C and D). Shaded intervals are intersubject SEM. Lines show the learning curves generated by the best computational model (QL with linear discount and different learning rates for reward and effort) identified by Bayesian model selection. D, Model fit. Scatter plots show intersubject correlations between estimated and observed responses for reward learning (left) and effort learning (right). Each dot represents one subject. Shaded areas indicate 95% confidence intervals on linear regression estimates.
Figure 2.
Figure 2.
Neural underpinnings or effort and reward learning. A, B, Statistical parametric maps show brain regions where activity at cue onset significantly correlated with expected reward (A) and with the difference between expected effort and reward (B) in a random-effects group analysis (p < 0.05, FWE cluster corrected). Axial and sagittal slices were taken at global maxima of interest indicated by red pointers on glass brains, and were superimposed on structural scans. [x y z] coordinates of the maxima refer to the Montreal Neurological Institute space. Plots show regression estimates for reward (orange) and effort (blue) prediction and prediction errors in each ROI. No statistical test was performed on the β-estimates of predictions, as they served to identify the ROIs. p values were obtained using paired two-tailed t tests. Errors bars indicate intersubject SEM. ns, Nonsignificant.

Source: PubMed

3
Sottoscrivi