Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning

Anna C Schapiro, Nicholas B Turk-Browne, Matthew M Botvinick, Kenneth A Norman, Anna C Schapiro, Nicholas B Turk-Browne, Matthew M Botvinick, Kenneth A Norman

Abstract

A growing literature suggests that the hippocampus is critical for the rapid extraction of regularities from the environment. Although this fits with the known role of the hippocampus in rapid learning, it seems at odds with the idea that the hippocampus specializes in memorizing individual episodes. In particular, the Complementary Learning Systems theory argues that there is a computational trade-off between learning the specifics of individual experiences and regularities that hold across those experiences. We asked whether it is possible for the hippocampus to handle both statistical learning and memorization of individual episodes. We exposed a neural network model that instantiates known properties of hippocampal projections and subfields to sequences of items with temporal regularities. We found that the monosynaptic pathway-the pathway connecting entorhinal cortex directly to region CA1-was able to support statistical learning, while the trisynaptic pathway-connecting entorhinal cortex to CA1 through dentate gyrus and CA3-learned individual episodes, with apparent representations of regularities resulting from associative reactivation through recurrence. Thus, in paradigms involving rapid learning, the computational trade-off between learning episodes and regularities may be handled by separate anatomical pathways within the hippocampus itself.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'.

Keywords: associative inference; community structure; medial temporal lobe; neural network model; temporal regularities.

© 2016 The Author(s).

Figures

Figure 1.
Figure 1.
Model architecture. ECin serves as input and ECout as output for the network. The network is trained to reproduce the pattern of activity in ECin on ECout. Three hidden layers—DG, CA3, and CA1—learn representations to support this mapping, with activity flow governed by the projections indicated by the arrows. Blue arrows make up the TSP and green arrows make up the MSP. This snapshot shows network activity during pair structure learning, where pair AB is presented to the network and successfully reproduced in ECout. The height and yellowness of a unit both index its activity level.
Figure 2.
Figure 2.
Pair structure. (a) Average representational similarity across networks in each of the three hidden layers of the model, after training on episodic sequences that did not require statistical learning (SL). In the heatmaps, each of the eight test items appears in the rows and columns, the diagonals correspond to patterns correlated with themselves, and the off-diagonals are symmetric. (b) Average representational similarity by pair type, for the initial and settled response. ‘Shuffled’ pairs are items paired with all other items that were not the trained pairmate (e.g. AC, DA), including both viewed pairings (e.g. DA) and unviewed pairings (e.g. AC). (c) Average probability of activating a particular item on the output given a particular item on the input, over training. For example, AB is the probability of activating the second member of a pair above threshold given the first. ‘Incorrect’ is the probability of producing an item that is not the current item or its pairmate. Each input was presented once per epoch in permuted order. (df) Same as above, for sequences that required SL. Each pair was presented approximately five times per epoch (with 80 total inputs per epoch). For all subplots, values are means across 500 random network initializations. Error bars denote ±1 s.e.m. across network initializations. Some error bars are too small to be visible.
Figure 3.
Figure 3.
Community structure. (a) Average representational similarity after training, with items arranged by community (black boxes; grey bars mark boundary nodes). (b) Graph with three communities of nodes. Each node on the graph represents a particular item, and the edges indicate which transitions were allowed (bidirectionally). (c) Average probability of activating units from the same community given an internal item (black node) or boundary item (grey node) as input, over the course of training. Values are lower than chance (0.33) prior to training due to units not reaching 0.5 threshold level of activity. (d) Difference between the settled and initial heatmaps in CA1. (e) Average representational similarity between two neighbouring nodes from the same community (within internal), the two boundary nodes from the same community (within boundary), two adjacent boundary nodes from different communities (across boundary), and all other pairs of items from different communities (across other).
Figure 4.
Figure 4.
Associative inference. (a) Average representational similarity after training, with items arranged by triad. The settled response is shown with and without recurrence allowed. The initial response was very similar for the two variants and is shown with recurrence. (b) Similarity structure after one training trial with each of the direct pairs. The pattern correlations between members of direct AB pairs and between members of transitive AC pairs are shown, subtracting the correlation for shuffled pairs as a baseline. This is shown for the initial and settled response, in both cases in networks with recurrence allowed (though only the settled response is affected by recurrence). (c) The probability of producing the direct pairmate (B given A) and transitive pairmate (C given A), subtracting the probability of producing other items as a baseline, with and without recurrence. We allowed any above-zero activity in ECout units to count as ‘producing the item’, simulating the sensitive forced choice test used in associative inference studies [14]. (d,e) Same as b, c for fully trained network.

Source: PubMed

3
Sottoscrivi