Musical reward prediction errors engage the nucleus accumbens and motivate learning

Benjamin P Gold, Ernest Mas-Herrero, Yashar Zeighami, Mitchel Benovoy, Alain Dagher, Robert J Zatorre, Benjamin P Gold, Ernest Mas-Herrero, Yashar Zeighami, Mitchel Benovoy, Alain Dagher, Robert J Zatorre

Abstract

Enjoying music reliably ranks among life's greatest pleasures. Like many hedonic experiences, it engages several reward-related brain areas, with activity in the nucleus accumbens (NAc) most consistently reflecting the listener's subjective response. Converging evidence suggests that this activity arises from musical "reward prediction errors" (RPEs) that signal the difference between expected and perceived musical events, but this hypothesis has not been directly tested. In the present fMRI experiment, we assessed whether music could elicit formally modeled RPEs in the NAc by applying a well-established decision-making protocol designed and validated for studying RPEs. In the scanner, participants chose between arbitrary cues that probabilistically led to dissonant or consonant music, and learned to make choices associated with the consonance, which they preferred. We modeled regressors of trial-by-trial RPEs, finding that NAc activity tracked musically elicited RPEs, to an extent that explained variance in the individual learning rates. These results demonstrate that music can act as a reward, driving learning and eliciting RPEs in the NAc, a hub of reward- and music enjoyment-related activity.

Keywords: abstract reward; fMRI; music; nucleus accumbens; reward prediction errors.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Probabilistic musical decision-making task. Participants first chose between two arbitrary colors to initiate the playing of a Bach chorale: The color choice determined its timbre. As the chorale reached halfway, a cue prompted participants to choose between two arbitrary directions: This choice determined whether the chorale ended consonantly or dissonantly. Each choice was associated with a specific outcome, but these outcomes were probabilistic to elicit prediction errors. Each color led to its associated timbre 70% of the time. Within one timbre context, each direction choice led to its ending 85% of the time; in the other timbre, it was 70%. The associations were randomized across participants, who were not made explicitly aware of the contingencies but were simply told to try to make optimal choices to hear the music they wanted. The optimal choices for the case shown in the figure were thus yellow to select its associated the harp timbre and then left if the chorale played in that timbre or right if not. The task had 60 trials in two runs of two equal blocks each.
Fig. 2.
Fig. 2.
Task behavior and model fit. (A) Overall accuracy. Considering color and direction choices independently, participants made optimal choices significantly more often than chance [t(18) = 2.40, P = 0.03]. (B) Two-step accuracy. Participants also made both optimal decisions within a trial significantly more often than chance, suggesting learning of the task’s two-step pathway [t(18) = 2.67, P = 0.02]. (C) Overall accuracy improved throughout the task for the group as a whole (β^ = 0.02, P = 0.04) and for 12 of the 18 participants tested (all β^s > 0.02, P < 0.05). (D) Two-step accuracy did not improve throughout the task for the group as a whole (β^ = 0.01, P = 0.23), but it did for eight of the 18 participants tested (all β^s > 0.02, P < 0.05). (E) Reinforcement-learning models fit the group’s choices significantly better than corresponding null models [t(19) = 3.93, P < 0.01]. Red lines indicate means, boxes indicate 1 SD from the mean, and blue lines show the chance level references of the statistical tests.
Fig. 3.
Fig. 3.
Musically elicited RPEs reflected in NAc activity. (A) Within the bilateral NAc region of interest (purple), computationally modeled RPEs significantly correlated with activity in a right-hemisphere cluster after family-wise error correction (peak voxel: 12, 8, −8; z = 3.53; PFWE = 0.01; orange scale). (B) Conjunction of three analyses supports the original finding. Blue indicates consonant endings > dissonant endings (outcome contrast), red indicates RPEs for consonant endings (parametric effect of rewards), green indicates RPEs for dissonant endings (parametric effect of punishments), and yellow indicates conjunction. All images are thresholded at z ≥ 2. In coronal slices, y = 8. In sagittal slices, x = 12. L, left; R, right.
Fig. 4.
Fig. 4.
Correlates of RPE-related activity in the right NAc. (A) Parameter estimates (param. estim.) of RPE signaling in the right NAc (shown in pink at x = 12) significantly correlate with the standardized learning slopes of overall accuracy [F(1,16) = 6.03, P = 0.03, R2 = 0.27, Adj. R2 = 0.23]. (B) BMRQ scores also correlate with RPE-like activity in the right NAc. *P < 0.05.

Source: PubMed

3
Subskrybuj