Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning

Anne Gabrielle Eva Collins, Michael Joshua Frank, Anne Gabrielle Eva Collins, Michael Joshua Frank

Abstract

Often the world is structured such that distinct sensory contexts signify the same abstract rule set. Learning from feedback thus informs us not only about the value of stimulus-action associations but also about which rule set applies. Hierarchical clustering models suggest that learners discover structure in the environment, clustering distinct sensory events into a single latent rule set. Such structure enables a learner to transfer any newly acquired information to other contexts linked to the same rule set, and facilitates re-use of learned knowledge in novel contexts. Here, we show that humans exhibit this transfer, generalization and clustering during learning. Trial-by-trial model-based analysis of EEG signals revealed that subjects' reward expectations incorporated this hierarchical structure; these structured neural signals were predictive of behavioral transfer and clustering. These results further our understanding of how humans learn and generalize flexibly by building abstract, behaviorally relevant representations of the complex, high-dimensional sensory environment.

Keywords: Clustering; EEG; Prefrontal cortex; Structure-learning; Transfer.

Copyright © 2016 Elsevier B.V. All rights reserved.

Figures

Figure 1. Experimental protocol
Figure 1. Experimental protocol
A): Single trial structure for the behavioral experiment. B): Single trial structure for the EEG experiment. C) The table indicates correct (rewarded) action (A) contingencies for each context-stimulus (C-S) pair in initial phase A, and transfer phases B and C. “TS” indicates a task-set of stimulus-action contingencies that can be selected in a given context. If subjects learn that C0 and C1 cue the same TS1 in phase A, then they should more easily acquire new S-A associations that are shared across those contexts in phase B. In phase C, TSold indicates that the valid TS in new context C3 corresponds to one of the previously learned TS1 or TS2, whereas TSnew denotes that a new set of S-A associations needs to be learned for context C4.
Figure 2. Initial learning phase A and…
Figure 2. Initial learning phase A and transfer phase B
A): Hierarchically-structured representation of phase A (light arrows) and new S-A associations to be learned in phase B (bold arrows). Subjects can learn in phase A to cluster C0 and C1 together to indicate the same abstract latent rule TS1. They can also expand the content of that TS (shared across contexts) in phase B to append new S-A mappings to it. B) Example of stimuli presented in initial phase A and transfer phase B of the experiments. Note that red and grey shapes are half as frequent as yellow shapes, such that TS1 and TS2 are both equally frequent. C–D): learning curves for initial phase A and transfer phase B, plotting the proportion of correct trials as a function of number of encounters of a given colored-shape, averaged over CO/C1 colored shapes (purple) and C2 colored-shapes (yellow). Within-cluster transfer is evident by faster learning of new S-A associations for C0/C1 than for C2 in phase B, despite slower initial learning in phase A.
Figure 3. Transfer to novel contexts and…
Figure 3. Transfer to novel contexts and clustering priors
A): Hierarchically structured representation of phase B and C. If subjects applied structure to learning in phase A/B, they can then recognize that C3 points to one of the previously learned latent rules (either TS1 or TS2, dotted arrows) and hence generalize their learned S-A mappings. In contrast they would need to create a new TS3 for context C4. B) learning curves for transfer phase C. Learning is speeded for TS that were previously valid in old contexts. This effect is particularly evident for those subjects for whom the old TS was the more popular TS1 (clustered across two contexts; middle graph) compared to the less popular TS2 (right). C, D) summary measure over first 3 trials for each condition (TSold or TSnew): mean performance (C), slope (D). E) Action choice for first trial in phase C. Proportion of subjects who chose the action prescribed by TS1 for that stimulus, by TS2, or either of the other two actions. There is a strong bias towards TS1, prior to any information in the new phase, despite equal TS and action frequencies.
Figure 4
Figure 4
Model simulations from structure learning model with parameters fitted to individual subjects’ behavior in the EEG experiment. Learning curves show mean and standard error (error bars) across subjects, and represent proportion of correct trials for the xth presentation of each individual input pattern. A–B): phase A/B simulations account for the empirically observed transfer, with greater performance in C0/C1 than C2 in phase B, and the opposite counter-intuitive pattern in phase A. C) Phase C shows transfer of old task-set to a new context. D) Proportion of chosen TS1, TS2 or other actions taken for first 2 iterations of every input pattern of phase C shows a generalization bias to select previous TS1 more than TS2 (“context popularity-based clustering”), which was in turn more likely than other actions.
Figure 5. EEG effect of prediction error
Figure 5. EEG effect of prediction error
Top: scalp maps at representative time points of t-statistic of βFPE across subjects, corresponding to the three cluster-groups identified as ROIs. Bold black dots indicate for visualization purpose corrected p<0.05 significant effects. Bottom: average across subjects of flat prediction error regressor βFPE, for electrodes FCz, Cz and POz. Circles indicate significance against 0 at p<.05 (cluster-based permutation tested).
Figure 6. SPE effects in EEG
Figure 6. SPE effects in EEG
A): average regression weight for unique structure RL variance in each group ROI shows that SPE accounts for additional variance beyond flat PE (Error bars indicate standard error). B) early+ medium SPE effect predicts behavior. We separate subjects into “high” and “low” SPE effect groups, by median-split. Left: “High” group showed stronger “within cluster” transfer, as indicated by increase in TS1 vs. TS2 performance difference between phase A and B. Middle: “high” group showed a stronger bias to select previously more clustered action (TS1 action). Right: “High” group shows significantly more generalization of old task-sets to new context in phase C.

Source: PubMed

3
Subscribe