Model-based learning protects against forming habits
Claire M Gillan, A Ross Otto, Elizabeth A Phelps, Nathaniel D Daw, Claire M Gillan, A Ross Otto, Elizabeth A Phelps, Nathaniel D Daw
Abstract
Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.
Figures
References
- Adams CD. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1982;34B:77–98. doi: 10.1080/14640748208400878.
- Adams CD, Dickinson A. Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1981;33B:109–121. doi: 10.1080/14640748108400816.
- Akam, T., Dayan, P., & Costa, R. (2013). Multi-step decision tasks for dissociating model-based and model-free learning in rodents. Paper presented at the Cosyne 2013, Salt Lake City, UT.
- Balleine BW, Dickinson A. Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/S0028-3908(98)00033-1.
- Balleine BW, O’Doherty JP. Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. doi: 10.1038/npp.2009.131.
- Crump MJC, McDonnell JV, Gureckis TM. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS ONE. 2013;8:e57410. doi: 10.1371/journal.pone.0057410.
- Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027.
- Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience. 2005;8:1704–1711. doi: 10.1038/nn1560.
- Daw ND, O’Doherty JP. Multiple systems for value learning neuroeconomics: Decision making and the brain. Amsterdam, The Netherlands: Elsevier; 2014. pp. 393–410.
- de Wit S, Corlett PR, Aitken MR, Dickinson A, Fletcher PC. Differential engagement of the ventromedial prefrontal cortex by goal-directed and habitual behavior toward food pictures in humans. Journal of Neuroscience. 2009;29:11330–11338. doi: 10.1523/JNEUROSCI.1639-09.2009.
- de Wit S, Niry D, Wariyar R, Aitken MR, Dickinson A. Stimulus–outcome interactions during instrumental discrimination learning by rats and humans. Journal of Experimental Psychology: Animal Behavior Processes. 2007;33:1–11.
- de Wit S, Watson P, Harsay HA, Cohen MX, van de Vijver I, Ridderinkhof KR. Corticostriatal connectivity underlies individual differences in the balance between habitual and goal-directed action control. Journal of Neuroscience. 2012;32:12066–12075. doi: 10.1523/JNEUROSCI.1088-12.2012.
- Dezfouli A, Balleine BW. Actions, action sequences and habits: Evidence that goal-directed and habitual action control are hierarchically organized. PLoS Computational Biology. 2013;9:e1003364. doi: 10.1371/journal.pcbi.1003364.
- Dias-Ferreira, E., Sousa, J. C., Melo, I., Morgado, P., Mesquita, A. R., Cerqueira, J. J., … Sousa, N. (2009). Chronic stress causes frontostriatal reorganization and affects decision-making. Science, 325, 621–625. doi:10.1126/science.1171203
- Dickinson A. Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society B. 1985;308:67–78. doi: 10.1098/rstb.1985.0010.
- Dickinson A, Balleine B. Motivational control of goal-directed action. Animal Learning & Behavior. 1994;22:1–18. doi: 10.3758/BF03199951.
- Dickinson A, Nicholas DJ, Adams CD. The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1983;35B:35–51. doi: 10.1080/14640748308400912.
- Dickinson A, Wood N, Smith JW. Alcohol seeking by rats: Action or habit? Quarterly Journal of Experimental Psychology. 2002;55B:331–348. doi: 10.1080/0272499024400016.
- Dolan RJ, Dayan P. Goals and habits in the brain. Neuron. 2013;80:312–325. doi: 10.1016/j.neuron.2013.09.007.
- Doya, K. (1999). What are the computations in the cerebellum, the basal ganglia, and the cerebral cortex. Neural Networks, 12, 961–974.
- Eppinger B, Walter M, Heekeren HR, Li SC. Of goals and habits: Age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience. 2013;7:253. doi: 10.3389/fnins.2013.00253.
- Frank MJ, Rudy JW, Levy WB, O’Reilly RC. When logic fails: Implicit transitive inference in humans. Memory & Cognition. 2005;33:742–750. doi: 10.3758/BF03195340.
- Friedel E, Koch SP, Wendt J, Heinz A, Deserno L, Schlagenhauf F. Devaluation and sequential decisions: Linking goal-directed and model-based behaviour. Frontiers in Human Neuroscience. 2014;8:587. doi: 10.3389/fnhum.2014.00587.
- Gillan, C. M., Apergis-Schoute, A. M., Morein-Zamir, S., Urcelay, G. P., Sule, A., Fineberg, N. A., … Robbins, T. W. (2015). Functional neuroimaging of avoidance habits in obsessive-compulsive disorder. American Journal of Psychiatry, 172, 284–293. doi:10.1176/appi.ajp.2014.14040525
- Gillan, C. M., Morein-Zamir, S., Urcelay, G. P., Sule, A., Voon, V., Apergis-Schoute, A. M., … Robbins, T. W. (2014). Enhanced avoidance habits in obsessive-compulsive disorder. Biological Psychiatry, 75, 631–638. doi:10.1016/j.biopsych.2013.02.002
- Gillan CM, Papmeyer M, Morein-Zamir S, Sahakian BJ, Fineberg NA, Robbins TW, de Wit S. Disruption in the balance between goal-directed behavior and habit learning in obsessive-compulsive disorder. American Journal of Psychiatry. 2011;168:718–726. doi: 10.1176/appi.ajp.2011.10071062.
- Gillan CM, Robbins TW. Goal-directed learning and obsessive-compulsive disorder. Philosophical Transactions of the Royal Society B. 2014;369:475. doi: 10.1098/rstb.2013.0475.
- Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66:585–595. doi: 10.1016/j.neuron.2010.04.016.
- Keramati M, Dezfouli A, Piray P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Computational Biology. 2011;7:e1002055. doi: 10.1371/journal.pcbi.1002055.
- Miller, K., Erlich, J., Kopec, C., Botvinick, M., & Brody, C. (2014). A multi-step decision task elicits planning behavior in rats. Paper presented at Cosyne 2014, Salt Lake City, UT.
- Otto AR, Gershman SJ, Markman AB, Daw ND. The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological Science. 2013;24:751–761. doi: 10.1177/0956797612463080.
- Otto AR, Raio CM, Chiang A, Phelps EA, Daw ND. Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences. 2013;110:20941–20946. doi: 10.1073/pnas.1312011110.
- Otto AR, Skatova A, Madlon-Kay S, Daw ND. Cognitive control predicts use of model-based reinforcement learning. Journal of Cognitive Neuroscience. 2015;27:319–333. doi: 10.1162/jocn_a_00709.
- Pezzulo G, Rigoli F, Chersi F. The mixed instrumental controller: Using value of information to combine habitual choice and mental simulation. Frontiers in Psychology. 2013;4:92. doi: 10.3389/fpsyg.2013.00092.
- Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593.
- Schwabe L, Wolf OT. Stress prompts habit behavior in humans. Journal of Neuroscience. 2009;29:7191–7198. doi: 10.1523/JNEUROSCI.0979-09.2009.
- Seger CA, Spiering BJ. A critical review of habit learning and the basal ganglia. Frontiers in Systems Neuroscience. 2011;5:66. doi: 10.3389/fnsys.2011.00066.
- Simcox T, Fiez JA. Collecting response times using Amazon Mechanical Turk and Adobe Flash. Behavior Research Methods. 2014;46:95–111. doi: 10.3758/s13428-013-0345-y.
- Sjoerds, Z., de Wit, S., van den Brink, W., Robbins, T. W., Beekman, A. T., Penninx, B. W. & Veltman, D. J. (2013). Behavioral and neuroimaging evidence for overreliance on habit learning in alcohol-dependent patients. Translational Psychiatry, 3, e337.
- Sutton R, Barto A. Reinforcement learning: An introduction. Cambridge, MA: MIT Press; 1998.
- Tolman EC. Cognitive maps in rats and men. Psychological Review. 1948;55:189–208. doi: 10.1037/h0061626.
- Tricomi E, Balleine BW, O’Doherty JP. A specific role for posterior dorsolateral striatum in human habit learning. European Journal of Neuroscience. 2009;29:2225–2232. doi: 10.1111/j.1460-9568.2009.06796.x.
- Valentin VV, Dickinson A, O’Doherty JP. Determining the neural substrates of goal-directed learning in the human brain. Journal of Neuroscience. 2007;27:4019–4026. doi: 10.1523/JNEUROSCI.0564-07.2007.
- Voon, V., Derbyshire, K., Rück, C., Irvine, M. A., Worbe, Y., Enander, J., … Bullmore, E. T. (2014). Disorders of compulsivity: A common bias towards learning habits. Molecular Psychiatry. doi:10.1038/mp.2014.44
- Wunderlich K, Smittenaar P, Dolan RJ. Dopamine enhances model-based over model-free choice behavior. Neuron. 2012;75:418–424. doi: 10.1016/j.neuron.2012.03.042.
- Yin HH, Knowlton BJ, Balleine BW. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. European Journal of Neuroscience. 2004;19:181–189. doi: 10.1111/j.1460-9568.2004.03095.x.
- Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. European Journal of Neuroscience. 2005;22:513–523. doi: 10.1111/j.1460-9568.2005.04218.x.
Source: PubMed