Compositional clustering in task structure learning
Nicholas T Franklin, Michael J Frank, Nicholas T Franklin, Michael J Frank
Abstract
Humans are remarkably adept at generalizing knowledge between experiences in a way that can be difficult for computers. Often, this entails generalizing constituent pieces of experiences that do not fully overlap, but nonetheless share useful similarities with, previously acquired knowledge. However, it is often unclear how knowledge gained in one context should generalize to another. Previous computational models and data suggest that rather than learning about each individual context, humans build latent abstract structures and learn to link these structures to arbitrary contexts, facilitating generalization. In these models, task structures that are more popular across contexts are more likely to be revisited in new contexts. However, these models can only re-use policies as a whole and are unable to transfer knowledge about the transition structure of the environment even if only the goal has changed (or vice-versa). This contrasts with ecological settings, where some aspects of task structure, such as the transition function, will be shared between context separately from other aspects, such as the reward function. Here, we develop a novel non-parametric Bayesian agent that forms independent latent clusters for transition and reward functions, affording separable transfer of their constituent parts across contexts. We show that the relative performance of this agent compared to an agent that jointly clusters reward and transition functions depends environmental task statistics: the mutual information between transition and reward functions and the stochasticity of the observations. We formalize our analysis through an information theoretic account of the priors, and propose a meta learning agent that dynamically arbitrates between strategies across task domains to optimize a statistical tradeoff.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
References
- Marcus G, Marblestone A, Dean T. The atoms of neural computation. Science. 2014. October 31;346(6209):551–2. doi:
- Gershman SJ, Blei DM, Niv Y. Context, learning, and extinction. Psychological review. 2010. January;117(1):197 doi:
- Collins AG, Frank MJ. Cognitive control over learning: creating, clustering, and generalizing task-set structure. Psychological review. 2013. January;120(1):190 doi:
- Collins AG, Frank MJ. Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning. Cognition. 2016. July 31;152:160–9. doi:
- Collins AG, Cavanagh JF, Frank MJ. Human EEG uncovers latent generalizable rule structure during learning. Journal of Neuroscience. 2014. March 26;34(13):4677–85. doi:
- Rosman B, Hawasly M, Ramamoorthy S. Bayesian policy reuse. Machine Learning. 2016. July 1;104(1):99–127. doi:
- Mahmud MM, Hawasly M, Rosman B, Ramamoorthy S. Clustering markov decision processes for continual transfer. arXiv preprint arXiv:1311.3959. 2013 Nov 15.
- Wilson A, Fern A, Tadepalli P. Transfer learning in sequential decision problems: A hierarchical bayesian approach. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning 2012 Jun 27 (pp. 217-227).
- Leffler BR, Littman ML, Edmunds T. Efficient reinforcement learning with relocatable action models. In AAAI 2007 Jul 22 (Vol. 7, pp. 572-577).
- Lehnert L, Tellex S, Littman ML. Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning. arXiv preprint arXiv:1708.00102. 2017 Jul 31.
- Kansky K, Silver T, Mély DA, Eldawy M, Lázaro-Gredilla M, Lou X, Dorfman N, Sidor S, Phoenix S, George D. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics. arXiv preprint arXiv:1706.04317. 2017 Jun 14.
- Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ. Building machines that learn and think like people. Behavioral and Brain Sciences. 2016. November:1–01.
- James W. The principles of psychology (Vol. 1). New York: Holt; 1890;474.
- Fermin A, Yoshida T, Ito M, Yoshimoto J, Doya K. Evidence for model-based action planning in a sequential finger movement task. J Mot Behav. 2010;42(6):371–9. doi:
- Fermin ASR, Yoshida T, Yoshimoto J, Ito M, Tanaka SC, Doya K. Model-based action planning involves cortico-cerebellar and basal ganglia networks. Sci Rep. 2016;6(July):1–14.
- Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. Journal of artificial intelligence research. 1996;4:237–85.
- Sanborn AN, Griffiths TL, Navarro DJ. Rational approximations to rational models: alternative algorithms for category learning. Psychological review. 2010. October;117(4):1144 doi:
- Shafto P, Kemp C, Mansinghka V, Tenenbaum JB. A probabilistic model of cross-categorization. Cognition. 2011. July 31;120(1):1–25. doi:
- Aldous DJ. Exchangeability and related topics In École d’Été de Probabilités de Saint-Flour XIII—1983 1985. (pp. 1–198). Springer, Berlin, Heidelberg.
- Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence. 1999. August 1;112(1-2):181–211. doi:
- Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009. December 31;113(3):262–80. doi:
- Sanborn AN, Griffiths TL, Navarro DJ. A More Rational Model of Categorization. Proc 28th Annu Conf Cogn Sci Soc. 2006;1–6.
- Frank MJ, Badre D. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. Cereb Cortex 2012 Mar
- Doya K, Samejima K, Katagiri K, Kawato M. Multiple model-based reinforcement learning. Neural Comput. 2002. June;14(6):1347–69. doi:
- Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika. 1933. December 1;25(3/4):285–94. doi:
- Berger JO. Statistical decision theory and Bayesian analysis. 2nd ed New York, NY: Springer-Verlag; 1985.
- Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB. Probabilistic models of cognition: exploring representations and inductive biases. Trends Cogn Sci. 2010;14(8):357–64. doi:
- Reverberi C, Görgen K, Haynes J-D. Compositionality of Rule Representations in Human Prefrontal Cortex. Cereb Cortex. 2012;22(6):1237–46. doi:
- Kriete T, Noelle DC, Cohen JD, O’Reilly RC. Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proc Natl Acad Sci. 2013;110(41):16390–5. doi:
- Diuk C, Cohen A, Littman ML. An object-oriented representation for efficient reinforcement learning. InProceedings of the 25th international conference on Machine learning 2008 Jul 5 (pp. 240-247). ACM.
- Konidaris G, Barto A. Building portable options: Skill transfer in reinforcement learning. IJCAI Int Jt Conf Artif Intell. 2007;895–900.
- Konidaris G. Constructing Abstraction Hierarchies Using a Skill-Symbol Loop. In: IJCAI International Joint Conference on Artificial Intelligence. 2016. p. 1648–1654.
- Solway A, Diuk C, Córdova N, Yee D, Barto AG, Niv Y, et al. Optimal Behavioral Hierarchy. PLoS Comput Biol [Internet]. 2014;10(8):e1003779 doi:
- Mirza MB, Adams RA, Mathys CD, Friston KJ. Scene Construction, Visual Foraging, and Active Inference. Front Comput Neurosci. 2016. doi:
- Mirza MB, Adams RA, Mathys C, Friston KJ. Human visual exploration reduces uncertainty about the sensed world. PloS one. 2018. January 5;13(1):e0190429 doi:
- Friedman JH. On Bias, Variance, 0 / 1—Loss, and the Curse-of-Dimensionality. Data Min Knowl Discov. 1997;77:55–6. doi:
- Wingate D, Diuk C, Donnell TO, Tenenbaum JB, Gershman S, Labs L, et al. Compositional Policy Priors Compositional Policy Priors. 2013
- Russek EM, Momennejad I, Botvinick MM, Gershman SJ. Predictive representations can link model—based reinforcement learning to model—free mechanisms. PLoS Computational Biology. 2017. 1–35 p.
- Momennejad I, Russek EM, Cheong JH, Botvinick MM, Daw ND, Gershman SJ. The successor representation in human reinforcement learning. Nature Human Behaviour. 2017. September;1(9):680 doi:
- Machado MC, Bellemare MG, Bowling M. A Laplacian Framework for Option Discovery in Reinforcement Learning. arXiv preprint arXiv:1703.00956. 2017 Mar 2.
- Fodor J, Pylyshyn Z. Connectionism and cognitive architecture: A critical analysis. Cognition. 1988. doi:
- Collins AGE, Cavanagh JF, Frank MJ. Human EEG Uncovers Latent Generalizable Rule Structure during Learning. J Neurosci. 2014;34(13):4677–85. doi:
- Stachenfeld KL, Botvinick MM, Gershman SJ. The hippocampus as a predictive map. Nat Neurosci. 2017;20(11):1643–53. doi:
- Nagabandi A, Kahn G, Fearing RS, Levine S. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning. 2017; Available from:
- Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66(4):585–95. doi:
- Huys QJM, Eshel N, O’Nions E, Sheridan L, Dayan P, Roiser JP. Bonsai trees in your head: How the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Comput Biol. 2012;8(3). doi:
Source: PubMed