Selective reinforcement learning deficits in schizophrenia support predictions from computational models of striatal-cortical dysfunction

James A Waltz, Michael J Frank, Benjamin M Robinson, James M Gold, James A Waltz, Michael J Frank, Benjamin M Robinson, James M Gold

Abstract

Background: Rewards and punishments may make distinct contributions to learning via separate striatal-cortical pathways. We investigated whether fronto-striatal dysfunction in schizophrenia (SZ) is characterized by selective impairment in either reward- (Go) or punishment-driven (NoGo) learning.

Methods: We administered two versions of a probabilistic selection task to 40 schizophrenia patients and 31 control subjects, using difficult to verbalize stimuli (experiment 1) and nameable objects (experiment 2). In an acquisition phase, participants learned to choose between three different stimulus pairs (AB, CD, EF) presented in random order, based on probabilistic feedback (80%, 70%, 60%). We used analyses of variance (ANOVAs) to assess the effects of group and reinforcement probability on two measures of contingency learning. To characterize the preference of subjects for choosing the most rewarded stimulus and avoiding the most punished stimulus, we subsequently tested participants with novel pairs of stimuli involving either A or B, providing no feedback.

Results: Control subjects demonstrated superior performance during the first 40 acquisition trials in each of the 80% and 70% conditions versus the 60% condition; patients showed similarly impaired (<60%) performance in all three conditions. In novel test pairs, patients showed decreased preference for the most rewarded stimulus (A; t = 2.674; p = .01). Patients were unimpaired at avoiding the most negative stimulus (B; t = .737).

Conclusions: The results of these experiments provide additional evidence for the presence of deficits in reinforcement learning in SZ, suggesting that reward-driven learning may be more profoundly impaired than punishment-driven learning.

Figures

Fig. 1
Fig. 1
The Probabilistic Stimulus Selection (PSS) Task. The task consists of two phases: During an “acquisition phase”, subjects are presented with three training pairs, and instructed to identify which stimulus from each pair is more frequently reinforced. In AB trials, for example, a choice of stimulus A leads to positive feedback in 80% of trials, whereas a B choice is reinforced on the remaining 20%. Learning the most-frequently rewarded stimulus in each pair can be accomplished either by learning that one of the stimuli leads to positive feedback, or that the other leads to negative feedback (or both). Subjects are told to choose that stimulus as often as possible. Once subjects reach criterion on all three training pairs, or complete 360 total trials, they proceed to a “post-acquisition test phase,” during which they are presented with four trials each of the three training pairs, along with 12 new pairs created from all unused combinations of the training stimuli. The eight new stimulus pairs involving A and B are called the “transfer pairs” and used to gauge “Go” and “NoGo” learning. If positive feedback was more effective, they should reliably choose stimulus A in all novel test pairs in which it is present; if they learned more from negative feedback, they should avoid stimulus B.
Fig. 2
Fig. 2
The cortico-striato-thalamo-cortical loops, including the direct and indirect pathways of the basal ganglia. The cells of the striatum are divided into two sub-classes based on differences in biochemistry and efferent projections. The “Go” cells project directly to the GPi/SNr, and their activity disinhibits the thalamus, thereby facilitating the execution of a cortical response. The “NoGo” cells are part of the indirect pathway to the GPi/SNr, and have an opposing effect, suppressing actions from getting executed. Dopamine from the SNc projects to the dorsal striatum, differentially modulating activity in the direct and indirect pathways by activating different receptors: The Go cells express the D1 receptor, and the NoGo cells express the D2 receptor. The orbitofrontal cortex is thought to maintain reinforcement-related information in working memory and provide top-down biasing on the more primitive BG system, in addition to direct influencing of response selection processes in premotor cortex. The OFC receives information about relative magnitude of reinforcement values from the ABL, which it can also maintain in working memory. Dopamine from the VTA projects to ventral striatum (not shown) and orbitofrontal cortex. GPi: internal segment of globus pallidus; GPe: external segment of globus pallidus; SNc: substantia nigra pars compacta; SNr: substantia nigra pars reticulata; VTA: ventral tegmental area; ABL: basolateral amygdala.
Fig. 3
Fig. 3
Acquisition of probabilistic contingencies by patients (SZs) and controls (NCs) in Experiment 2. (A) In blocks 1 and 2. (B) Performance on training pairs at post-acquisition test. The proportion of correct responses was defined as the proportion of trails on which the most-frequently reinforced stimulus was chosen. In both panels, black bars = control subjects, white bars = patients.
Fig. 4
Fig. 4
Performance of subjects on two measures of feedback-driven learning from Experiment 2. In both plots, black bars = control subjects, white bars = patients. (A) Impact of trial-by-trial task feedback on subsequent choices in a given condition in first acquisition block (20 trials in each stimulus condition). “Win-stay” scores reflect the proportion of repeated stimulus selections in a given condition following reinforced choices. “Lose-shift” scores reflect the proportion of switched stimulus selections in a given condition following non-reinforced choices. Total “win-stay” and “lose-shift” scores were generated by averaging scores across conditions for each. (B) Performance 24 controls and 32 patients qualified for transfer analysis in the post-acquisition test phase. This analysis only included subjects who demonstrated acquisition of the 80:20 contingency by choosing A on at least 75% of AB test trials, and thus, the groups showed similar performance on the AB (80:20) test pair. “Go” learning was assessed using novel pairs involving the 80%-reinforced stimulus (Choose A v. Novel), as choosing A depends on having learned from positive feedback. “NoGo” learning was assessed using novel pairs involving the 20%-reinforced stimulus (Avoid B v. Novel), as avoiding B depends on having learned from negative feedback.

Source: PubMed

3
Subscribe