Learning predictive statistics from temporal sequences: Dynamics and strategies

Rui Wang, Yuan Shen, Peter Tino, Andrew E Welchman, Zoe Kourtzi, Rui Wang, Yuan Shen, Peter Tino, Andrew E Welchman, Zoe Kourtzi

Abstract

Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.

Figures

Figure 1
Figure 1
Trial and sequence design. (a) Eight to 14 symbols were presented one at a time in a continuous stream followed by a cue and the test display. (b) Sequence design. For the zero-order model (Level 0): Different states (A, B, C, D) are assigned to four symbols with different probabilities. For first- (Level 1) and second- (Level 2) order models, diagrams indicate states (circles) and conditional probabilities (red arrow: high; gray arrow: low). Transitional probabilities were arranged in a 4 × 4 (Level 1) or 4 × 6 (Level 2) conditional-probability matrix.
Figure 2
Figure 2
Experiment 1: Behavioral performance. (a) Performance index for Group 0 (n = 19) across training (solid circles) blocks, pretraining test (Pre: open squares), and posttraining test (Post: open squares). The performance index expresses the absolute distance (proportion overlap) between the distribution of participant responses and the distribution of presented targets. Overall performance index is calculated as the weighted average across context probabilities. Data are fitted for participants who improved during training (black circles). Data are also shown for one participant who did not improve during training (Level 2, gray symbols). Error bars show standard error of the mean. (b) Response probabilities for individual targets (Level 0) or conditional probabilities of context–target contingencies (Levels 1 and 2) across training blocks. Red lines indicate targets or context–target contingencies with the highest (conditional) probability (i.e., 0.72 for Level 0 and 0.8 for Levels 1 and 2), blue lines indicate the second-highest (conditional) probabilities (i.e., 0.18 for Level 0 and 0.2 for Levels 1 and 2), and green lines indicate targets or context–target contingencies that appear rarely (i.e., 0.05) or not at all. For Level 2, first- and second-order contexts are presented separately (dashed vs. solid lines).
Figure 3
Figure 3
Experiment 1: Response tracking. (a) Functional clustering analysis (Group 0) showed two data clusters, indicated in red (Level 0: n = 13, Level 1: n = 14, Level 2: n = 11) versus blue (Level 0: n = 6, Level 1: n = 5, Level 2: n = 6). Mixture coefficient curves are shown for each individual participant; bold curves indicate sigmoid fits to each cluster. Data are also shown for two participants (black lines) who showed less than a 25% probability of extracting the correct context length at the end of training. (b) Learning predictive probabilities. ΔKL curves between the predictive mixture model for each level and baseline models across training blocks. ΔKL values above zero indicate that the participant responses approximated the Markov model that generated the sequences. Average data are shown per participant cluster (i.e., red vs. blue). Note: The smaller ΔKL values and error bars for Level 2 reflect small differences between Level 1 and Level 2 models; yet fast learners show higher values than zero, indicating that they are able to learn second-order context–target contingencies. Error bars show the standard error of the mean. (c) Strategy choice, as indicated by comparing (ΔKL) matching versus maximization for each participant per cluster (i.e., red vs. blue).
Figure 4
Figure 4
Experiment 2: Behavioral performance. Data for Group 1 (n = 8; Levels 1 and 2) and Group 2 (n = 12; Level 2). Performance index is shown across training (solid circles) blocks, pretraining test (Pre: open squares), and posttraining test (Post: open squares). Fitted data are shown for participants who improved during training (black circles). Data are also shown for participants (n = 4) in Group 2 who did not improve during training (Level 2, gray symbols). Error bars show standard error of the mean.
Figure 5
Figure 5
Strategies for learning context-based statistics. (a) Correlations of individual strategy index and learning rate for participants who improved at both Levels 1 and 2 during training in Group 0 and Group 1. (b) Correlation of individual strategy index between Level 1 and Level 2 for participants trained in Group 0 and Group 1. Negative strategy-index values indicate a strategy closer to matching, while positive values indicate a strategy closer to maximization.

References

    1. Acerbi, L., Vijayakumar, S., Wolpert, D. M.. (2014). On the origins of suboptimality in human probabilistic inference. PLoS Computational Biology, 10 6, e1003661, doi:.
    1. Amso, D., Davidow, J.. (2012). The development of implicit learning from infancy to adulthood: Item frequencies, relations, and cognitive flexibility. Developmental Psychobiology, 54 6, 664– 673, doi:.
    1. Antoniou, M., Ettlinger, M., Wong, P. C. M.. (2016). Complexity, training paradigm design, and the contribution of memory subsystems to grammar learning. PLOS ONE, 11 7, e0158812, doi:.
    1. Aslin, R. N., Newport, E. L.. (2012). Statistical learning from acquiring specific items to forming general rules. Current Directions in Psychological Science, 21 3, 170– 176.
    1. Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433– 436.
    1. Bulf, H., Johnson, S. P., Valenza, E.. (2011). Visual statistical learning in the newborn infant. Cognition, 121 1, 127– 132, doi:.
    1. Chun, M. M. (2000). Contextual cueing of visual attention. Trends in Cognitive Sciences, 4 5, 170– 178, doi:.
    1. Chun, M. M., Jiang, Y. H.. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36 1, 28– 71, doi:.
    1. Conway, C. M., Christiansen, M. H.. (2001). Sequential learning in non-human primates. Trends in Cognitive Sciences, 5 12, 539– 546.
    1. Conway, C. M., Christiansen, M. H.. (2005). Modality-constrained statistical learning of tactile, visual, and auditory sequences. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31 1, 24– 39, doi:.
    1. Dale, R., Duran, N. D., Morehead, J. R.. (2012). Prediction during statistical learning, and implications for the implicit/explicit divide. Advances in Cognitive Psychology, 8 2, 196– 209.
    1. Dayan, P., Niv, Y.. (2008). Reinforcement learning: The good, the bad and the ugly. Current Opinion in Neurobiology, 18 2, 185– 196.
    1. Dehaene, S., Meyniel, F., Wacongne, C., Wang, L., Pallier, C.. (2015). The neural representation of sequences: From transition probabilities to algebraic patterns and linguistic trees. Neuron, 88 1, 2– 19, doi:.
    1. Droll, J. A., Abbey, C. K., Eckstein, M. P.. (2009). Learning cue validity through performance feedback. Journal of Vision, 9 2: 18, 1– 23, doi:. [] []
    1. Eckstein, M. P., Abbey, C. K., Pham, B. T., Shimozaki, S. S.. (2004). Perceptual learning through optimization of attentional weighting: Human versus optimal Bayesian learner. Journal of Vision, 4 12: 3, 1006– 1019, doi:. [] []
    1. Eckstein, M. P., Mack, S. C., Liston, D. B., Bogush, L., Menzel, R., Krauzlis, R. J.. (2013). Rethinking human visual attention: Spatial cueing effects and optimality of decisions by honeybees, monkeys and humans. Vision Research, 85, 5– 19.
    1. Fiser, J., Aslin, R. N.. (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological Science, 12 6, 499– 504.
    1. Fiser, J., Aslin, R. N.. (2002a). Statistical learning of higher-order temporal structure from visual shape sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28 3, 458– 467.
    1. Fiser, J., Aslin, R. N.. (2002b). Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences, USA, 99 24, 15822– 15826.
    1. Fiser, J., Aslin, R. N.. (2005). Encoding multielement scenes: Statistical learning of visual feature hierarchies. Journal of Experimental Psychology: General, 134 4, 521– 537.
    1. Fiser, J., Berkes, P., Orbán, G., Lengyel, M.. (2010). Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Sciences, 14 3, 119– 130.
    1. Fitch, W., Martins, M. D.. (2014). Hierarchical processing in music, language, and action: Lashley revisited. Annals of the New York Academy of Sciences, 1316 1, 87– 104.
    1. Franco, A., Destrebecqz, A.. (2012). Chunking or not chunking? How do we find words in artificial language learning? Advances in Cognitive Psychology, 8 2, 144– 154.
    1. Frost, R., Armstrong, B. C., Siegelman, N., Christiansen, M. H.. (2015). Domain generality versus modality specificity: The paradox of statistical learning. Trends in Cognitive Science, 19 3, 117– 125, doi:.
    1. Fulvio, J. M., Green, C. S., Schrater, P. R.. (2014). Task-specific response strategy selection on the basis of recent training experience. PLoS Computational Biology, 10 1, e1003425.
    1. Janacsek, K., Fiser, J., Nemeth, D.. (2012). The best time to acquire new skills: Age-related differences in implicit sequence learning across the human lifespan. Developmental Science, 15 4, 496– 505, doi:.
    1. Jensen, S., Boley, D., Gini, M., Schrater, P.. (2005, Month). Rapid on-line temporal sequence prediction by an adaptive agent. Paper presented at the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, July 25–29, Utrecht, the Netherlands.
    1. Kam, C. L., Newport, E. L.. (2009). Getting it right by getting it wrong: When learners change languages. Cognitive Psychology, 59 1, 30– 66, doi:.
    1. Karni, A., Sagi, D.. (1993). The time course of learning a visual skill. Nature, 365 6443, 250– 252.
    1. Kirkham, N. Z., Slemmer, J. A., Johnson, S. P.. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition, 83 2, B35– B42.
    1. Kirkham, N. Z., Slemmer, J. A., Richardson, D. C., Johnson, S. P.. (2007). Location, location, location: Development of spatiotemporal sequence learning in infancy. Child Development, 78 5, 1559– 1571, doi:.
    1. Knowlton, B. J., Squire, L. R., Gluck, M. A.. (1994). Probabilistic classification learning in amnesia. Learning & Memory, 1 2, 106– 120.
    1. Koechlin, E. (2014). An evolutionary computational theory of prefrontal executive function in decision-making. Philosophical Transactions of the Royal Society B: Biological Sciences, 369 (1655), doi:.
    1. Krogh, L., Vlach, H. A., Johnson, S. P.. (2012). Statistical learning across development: Flexible yet constrained. Frontiers in Psychology, 3, 598, doi:.
    1. Lagnado, D. A., Newell, B. R., Kahan, S., Shanks, D. R.. (2006). Insight and strategy in multiple-cue learning. Journal of Experimental Psychology: General, 135 2, 162– 183.
    1. Liu, J., Lu, Z. L., Dosher, B. A.. (2010). Augmented Hebbian reweighting: Interactions between feedback and training accuracy in perceptual learning. Journal of Vision, 10 10: 29, 1– 14, doi:. [] []
    1. Mitchel, A. D., Weiss, D. J.. (2011). Learning across senses: Cross-modal effects in multisensory statistical learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 37 5, 1081– 1091, doi:.
    1. Murray, R. F., Patel, K., Yee, A.. (2015). Posterior probability matching and human perceptual decision making. PLoS Computational Biology, 11 6, e1004342, doi:.
    1. Nastase, S., Iacovella, V., Hasson, U.. (2014). Uncertainty in visual and auditory series is coded by modality-general and modality-specific neural systems. Human Brain Mapping, 35 4, 1111– 1128, doi:.
    1. Nissen, M. J., Bullemer, P.. (1987). Attentional requirements of learning: Evidence from performance measures. Cognitive Psychology, 19 1, 1– 32, doi:.
    1. Opitz, B. (2010). Neural binding mechanisms in learning and memory. Neuroscience & Biobehavioral Reviews, 34 7, 1036– 1046.
    1. Orbán, G., Fiser, J., Aslin, R. N., Lengyel, M.. (2008). Bayesian learning of visual chunks by human observers. Proceedings of the National Academy of Sciences, USA, 105 7, 2745– 2750.
    1. Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10 4, 437– 442.
    1. Pelucchi, B., Hay, J. F., Saffran, J. R.. (2009). Learning in reverse: Eight-month-old infants track backward transitional probabilities. Cognition, 113 2, 244– 247, doi:.
    1. Perruchet, P., Pacton, S.. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in Cognitive Sciences, 10 5, 233– 238.
    1. Petrov, A. A., Dosher, B. A., Lu, Z. L.. (2005). The dynamics of perceptual learning: An incremental reweighting model. Psychological Review, 112 4, 715– 743, doi:.
    1. Petrov, A. A., Dosher, B. A., Lu, Z. L.. (2006). Perceptual learning without feedback in non-stationary contexts: Data and model. Vision Research, 46 19, 3177– 3197, doi:.
    1. Pothos, E. M. (2007). Theories of artificial grammar learning. Psychological Bulletin, 133 2, 227– 244.
    1. Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior, 6 6, 855– 863.
    1. Rieskamp, J., Otto, P. E.. (2006). SSL: A theory of how people learn to select strategies. Journal of Experimental Psychology: General, 135 2, 207– 236.
    1. Saffran, J. R., Aslin, R. N., Newport, E. L.. (1996, December 13) Statistical learning by 8-month-old infants. Science, 274 5294, 1926– 1928.
    1. Saffran, J. R., Johnson, E. K., Aslin, R. N., Newport, E. L.. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70 1, 27– 52.
    1. Saffran, J. R., Newport, E. L., Aslin, R. N.. (1996). Word segmentation: The role of distributional cues. Journal of Memory and Language, 35 4, 606– 621.
    1. Schwarb, H., Schumacher, E. H.. (2012). Generalized lessons about sequence learning from the study of the serial reaction time task. Advances in Cognitive Psychology, 8 2, 165– 178.
    1. Shanks, D. R., Tunney, R. J., McCarthy, J. D.. (2002). A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making, 15, 233– 250.
    1. Stevenson, H. W., Weir, M. W.. (1959). Variables affecting children's performance in a probability learning task. Journal of Experimental Psychology, 57 6, 403– 412.
    1. Turk-Browne, N. B., Junge, J. A., Scholl, B. J.. (2005). The automaticity of visual statistical learning. Journal of Experimental Psychology: General, 134 4, 552– 564, doi:.
    1. Turk-Browne, N. B., Scholl, B. J., Chun, M. M., Johnson, M. K.. (2009). Neural evidence of statistical learning: Efficient detection of visual regularities without awareness. Journal of Cognitive Neuroscience, 21 10, 1934– 1945.
    1. van den Bos, E., Poletiek, F. H.. (2008). Effects of grammar complexity on artificial grammar learning. Memory & Cognition, 36 6, 1122– 1131, doi:.
    1. Weir, M. W. (1964). Developmental changes in problem-solving strategies. Psychological Review, 71, 473– 490.

Source: PubMed

3
订阅