A causal role for right frontopolar cortex in directed, but not random, exploration

Wojciech K Zajkowski, Malgorzata Kossut, Robert C Wilson, Wojciech K Zajkowski, Malgorzata Kossut, Robert C Wilson

Abstract

The explore-exploit dilemma occurs anytime we must choose between exploring unknown options for information and exploiting known resources for reward. Previous work suggests that people use two different strategies to solve the explore-exploit dilemma: directed exploration, driven by information seeking, and random exploration, driven by decision noise. Here, we show that these two strategies rely on different neural systems. Using transcranial magnetic stimulation to inhibit the right frontopolar cortex, we were able to selectively inhibit directed exploration while leaving random exploration intact. This suggests a causal role for right frontopolar cortex in directed, but not random, exploration and that directed and random exploration rely on (at least partially) dissociable neural systems.

Keywords: decision making; directed exploration; explore-exploit; frontal pole; human; neuroscience; random exploration; reinforcement learning.

Conflict of interest statement

No competing interests declared.

Figures

Figure 1.. The horizon task.
Figure 1.. The horizon task.
Participants make a series of decisions between two one-armed bandits that pay out probabilistic rewards with unknown means. At the start of each game, ‘forced-choice’ trials give participants partial information about the mean of each option. We use the forced-choice trials to set up one of two information conditions: (A) an unequal (or [1 3]) condition in which participants see 1 play from one option and 3 plays from the other and (B) an equal (or [2 2]) condition in which participants see 2 plays from both options. A model-free measure of directed exploration is then defined as the change in information seeking with horizon in the unequal condition (A). Likewise a model-free measure of random exploration is defined as the change choosing the low mean option in the equal condition (B).
Figure 2.. The reward-information confound.
Figure 2.. The reward-information confound.
The y-axis corresponds to the correlation between the sign of the difference in mean (sgn(μleft−μright)) between options and the sign of difference in the number of times each option has been played (sgn(nleft−nright)). The forced trials are chosen such that the the correlation is approximately zero on the first free-choice trial. After the first trial, however, a positive correlation quickly emerges as participants choose the more rewarding options more frequently. This strong confound between reward and information makes it difficult to dissociate directed and random exploration on later trials.
Figure 3.. Model-free analysis of the first…
Figure 3.. Model-free analysis of the first free-choice trial shows that RPFC stimulation affects directed, but not random, exploration.
(A) In the control (vertex) condition, information seeking increases with horizon, consistent with directed exploration. When RFPC is stimulated, directed exploration is reduced, an effect that is entirely driven by changes in horizon 6 (* denotes p<0.02 and ** denotes p<0.005; error bars are ± s.e.m.). (B) Random exploration increases with horizon but is not affected by RFPC stimulation.
Figure 4.. Model-based analysis of the first…
Figure 4.. Model-based analysis of the first free-choice trial showing the effect of RFPC stimulation on each of the 13 parameters.
Left column: Posterior distributions over each parameter value for RFPC and vertex stimulation condition. Right column: posterior distributions over the change in each parameter between stimulation conditions. Note that, because information bonus, decision noise and spatial bias are all in units of points, we plot them on the same scale to facilitate comparison of effect size.
Figure 5.. Correlation between TMS-induced changes in…
Figure 5.. Correlation between TMS-induced changes in information bonus, A, and TMS-induced changes in the prior mean, R0.
(A, B) Samples from the posterior distributions over the TMS-related changes in prior mean, R0, and TMS-related change in information bonus in horizon 1 (A) and horizon 6 (B). In both cases we see a negative correlation between the change in R0 and the change in A consistent with a tradeoff between these variables in the model. (C) Samples from the posterior over the effect of TMS stimulation on the horizon-related change in information bonus, ΔA=A(h=6)-A(h=1) plotted against samples from the TMS-related change in prior mean. Here we see no correlation between variables and the majority of ΔA(vertex)−ΔA(RFPC) samples below zero consistent with an effect of RFPC stimulation on directed exploration.
Figure 6.. Model-free analysis of all trials.
Figure 6.. Model-free analysis of all trials.
(A, B) Model-free measures of directed (A) and random (B) exploration as a function of trial number suggests a reduction in both directed and random exploration over the course of the game. (C, D) TMS-induced change in measures of directed and random exploration as a function of trial number. This suggests that the reduction in directed exploration on the first free-choice trial, persists into the second trial of the game.
Figure 7.. Correlation between individual differences in…
Figure 7.. Correlation between individual differences in the levels of directed and random exploration in a sample of 277 people performing the Horizon Task.
Figure 8.. Graphical representation of the model.
Figure 8.. Graphical representation of the model.
Each variable is represented by a node, with edges denoting the dependence between variables. Shaded nodes correspond to observed variables, that is, the free choices cτ⁢s⁢h⁢u⁢g, forced-trial rewards, 𝐫τ⁢s⁢h⁢u⁢g and forced-trial choices 𝐚τ⁢s⁢h⁢u⁢g. Unshaded nodes correspond to unobserved variables whose values are inferred by the model.
Author response image 1.. No difference in…
Author response image 1.. No difference in effects between original and replication experiments.
In each panel we plot the model-free measures of directed and random exploration and how they change between stimulation conditions. For example, in Panel A, we plot p(high info) in horizon 1 for vertex stimulation (x-axis) and RFPC stimulation (y-axis). Each point in this plot is a single subject and the diagonal line represents equality. Participants below the diagonal line have a smaller value of p(high info) in the RFPC stimulation condition. From this we can clearly see that there is no effect of RFPC stimulation on directed exploration in horizon 1 (panel A), or random exploration in either horizon (B, D). However, there is a strong effect of RFPC stimulation on directed exploration in horizon 6 with the majority of points lying below the diagonal (C). Moreover, both the original and replication datasets point to the same conclusions in all four panels.
Author response image 2.. Effect of TMS…
Author response image 2.. Effect of TMS on information bonus in model with bonus proportional to uncertainty.
Author response image 3.. Effect of TMS…
Author response image 3.. Effect of TMS on decision noise in model in which bonus is a linear function of uncertainty.
Author response image 4.
Author response image 4.

References

    1. Aston-Jones G, Cohen JD. Adaptive gain and the role of the locus coeruleus-norepinephrine system in optimal performance. The Journal of Comparative Neurology. 2005;493:99–110. doi: 10.1002/cne.20723.
    1. Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning. 2002;47:235–256. doi: 10.1023/A:1013689704352.
    1. Badre D, Doll BB, Long NM, Frank MJ. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron. 2012;73:595–607. doi: 10.1016/j.neuron.2011.12.025.
    1. Berlyne DE. Curiosity and exploration. Science. 1966;153:25–33. doi: 10.1126/science.153.3731.25.
    1. Boorman ED, Behrens TE, Woolrich MW, Rushworth MF. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014.
    1. Cavanagh JF, Bismark AJ, Frank MJ, Allen JJ. Larger Error Signals in Major Depression are Associated with Better Avoidance Learning. Frontiers in Psychology. 2011;2:331. doi: 10.3389/fpsyg.2011.00331.
    1. Cohen JD, McClure SM, Yu AJ. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences. 2007;362:933–942. doi: 10.1098/rstb.2007.2098.
    1. Costa A, Oliveri M, Barban F, Bonnì S, Koch G, Caltagirone C, Carlesimo GA. The right frontopolar cortex is involved in visual-spatial prospective memory. PLoS ONE. 2013;8:e56039. doi: 10.1371/journal.pone.0056039.
    1. Costa A, Oliveri M, Barban F, Torriero S, Salerno S, Lo Gerfo E, Koch G, Caltagirone C, Carlesimo GA. Keeping memory for intentions: a cTBS investigation of the frontopolar cortex. Cerebral Cortex. 2011;21:2696–2703. doi: 10.1093/cercor/bhr052.
    1. Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766.
    1. Domenech P, Koechlin E. Executive control and decision-making in the prefrontal cortex. Current Opinion in Behavioral Sciences. 2015;1:101–106. doi: 10.1016/j.cobeha.2014.10.007.
    1. Donoso M, Collins AG, Koechlin E. Human cognition. Foundations of human reasoning in the prefrontal cortex. Science. 2014;344:1481–1486. doi: 10.1126/science.1252254.
    1. Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience. 2009;12:1062–1068. doi: 10.1038/nn.2342.
    1. Gittins JC, Jones DM. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika. 1979;66:561. doi: 10.1093/biomet/66.3.561.
    1. Gittins JC. Resource allocation in speculative chemical research. Journal of Applied Probability. 1974;11:255. doi: 10.1017/S0021900200036718.
    1. Gratton C, Lee TG, Nomura EM, D'Esposito M. The effect of theta-burst TMS on cognitive control networks measured with resting state fMRI. Frontiers in Systems Neuroscience. 2013;7:124. doi: 10.3389/fnsys.2013.00124.
    1. Hanlon CA, Dowdle LT, Correia B, Mithoefer O, Kearney-Ramos T, Lench D, Griffin M, Anton RF, George MS. Left frontal pole theta burst stimulation decreases orbitofrontal and insula activity in cocaine users and alcohol users. Drug and Alcohol Dependence. 2017;178:310–317. doi: 10.1016/j.drugalcdep.2017.03.039.
    1. Hills TT, Todd PM, Lazer D, Redish AD, Couzin ID, Cognitive Search Research Group Exploration versus exploitation in space, mind, and society. Trends in Cognitive Sciences. 2015;19:46–54. doi: 10.1016/j.tics.2014.10.004.
    1. Huang YZ, Edwards MJ, Rounis E, Bhatia KP, Rothwell JC. Theta burst stimulation of the human motor cortex. Neuron. 2005;45:201–206. doi: 10.1016/j.neuron.2004.12.033.
    1. Kalman RE. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering. 1960;82:35–45. doi: 10.1115/1.3662552.
    1. Kidd C, Hayden BY. The psychology and neuroscience of curiosity. Neuron. 2015;88:449–460. doi: 10.1016/j.neuron.2015.09.010.
    1. Koechlin E, Hyafil A. Anterior prefrontal function and the limits of human decision-making. Science. 2007;318:594–598. doi: 10.1126/science.1142995.
    1. Krueger PM, Wilson RC, Cohen JD. Strategies for exploration in the domain of losses. Judgment and Decision Making. 2017;12:104–117.
    1. Lee MD, Wagenmakers E. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press; 2014.
    1. Mansouri FA, Buckley MJ, Mahboubi M, Tanaka K. Behavioral consequences of selective damage to frontal pole and posterior cingulate cortices. PNAS. 2015;112:E3940–E3949. doi: 10.1073/pnas.1422629112.
    1. Mehlhorn K, Newell BR, Todd PM, Lee MD, Morgan K, Braithwaite VA, Hausmann D, Fiedler K, Gonzalez C. Unpacking the exploration–exploitation tradeoff: A synthesis of human and animal literatures. Decision. 2015;2:191–215. doi: 10.1037/dec0000033.
    1. Nassar MR, Wilson RC, Heasly B, Gold JI. An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. Journal of Neuroscience. 2010;30:12366–12378. doi: 10.1523/JNEUROSCI.0822-10.2010.
    1. Peirce JW. PsychoPy--Psychophysics software in Python. Journal of Neuroscience Methods. 2007;162:8–13. doi: 10.1016/j.jneumeth.2006.11.017.
    1. Plummer M. JAGS: a program for analysis of bayesian graphical models using gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing. 2003;124:125.
    1. Pollmann S. Frontopolar resource allocation in human and nonhuman primates. Trends in Cognitive Sciences. 2016;20:84–86. doi: 10.1016/j.tics.2015.11.006.
    1. Raja Beharelle A, Polanía R, Hare TA, Ruff CC. Transcranial stimulation over frontopolar cortex elucidates the choice attributes and neural mechanisms used to resolve exploration-exploitation trade-offs. Journal of Neuroscience. 2015;35:14544–14556. doi: 10.1523/JNEUROSCI.2322-15.2015.
    1. Rescorla RA, Wagner AR. A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Classical Conditioning II: Current Research and Theory. 1972;2:64–99.
    1. Rossi S, Hallett M, Rossini PM, Pascual-Leone A, Safety of TMS Consensus Group Safety, ethical considerations, and application guidelines for the use of transcranial magnetic stimulation in clinical practice and research. Clinical Neurophysiology. 2009;120:2008–2039. doi: 10.1016/j.clinph.2009.08.016.
    1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593.
    1. Somerville LH, Sasse SF, Garrad MC, Drysdale AT, Abi Akar N, Insel C, Wilson RC. Charting the expansion of strategic exploratory behavior during adolescence. Journal of Experimental Psychology: General. 2017;146:155–164. doi: 10.1037/xge0000250.
    1. Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika. 1933;25:285–294. doi: 10.1093/biomet/25.3-4.285.
    1. Volman I, Roelofs K, Koch S, Verhagen L, Toni I. Anterior prefrontal cortex inhibition impairs control over social emotional actions. Current Biology. 2011;21:1766–1770. doi: 10.1016/j.cub.2011.08.050.
    1. Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD. Humans use directed and random exploration to solve the explore-exploit dilemma. Journal of Experimental Psychology: General. 2014;143:2074–2081. doi: 10.1037/a0038199.
    1. Wischnewski M, Schutter DJ. Efficacy and time course of theta burst stimulation in healthy humans. Brain Stimulation. 2015;8:685–692. doi: 10.1016/j.brs.2015.03.004.

Source: PubMed

3
Abonnere