Learning from sensory and reward prediction errors during motor adaptation

Jun Izawa, Reza Shadmehr, Jun Izawa, Reza Shadmehr

Abstract

Voluntary motor commands produce two kinds of consequences. Initially, a sensory consequence is observed in terms of activity in our primary sensory organs (e.g., vision, proprioception). Subsequently, the brain evaluates the sensory feedback and produces a subjective measure of utility or usefulness of the motor commands (e.g., reward). As a result, comparisons between predicted and observed consequences of motor commands produce two forms of prediction error. How do these errors contribute to changes in motor commands? Here, we considered a reach adaptation protocol and found that when high quality sensory feedback was available, adaptation of motor commands was driven almost exclusively by sensory prediction errors. This form of learning had a distinct signature: as motor commands adapted, the subjects altered their predictions regarding sensory consequences of motor commands, and generalized this learning broadly to neighboring motor commands. In contrast, as the quality of the sensory feedback degraded, adaptation of motor commands became more dependent on reward prediction errors. Reward prediction errors produced comparable changes in the motor commands, but produced no change in the predicted sensory consequences of motor commands, and generalized only locally. Because we found that there was a within subject correlation between generalization patterns and sensory remapping, it is plausible that during adaptation an individual's relative reliance on sensory vs. reward prediction errors could be inferred. We suggest that while motor commands change because of sensory and reward prediction errors, only sensory prediction errors produce a change in the neural system that predicts sensory consequences of motor commands.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1. Experimental setup.
Figure 1. Experimental setup.
(A) In the reaching task, subjects held a handle of a robotic arm and made ‘shooting’ movements to move a cursor through a target at 10 cm. The arm was covered by a screen. During adaptation, the cursor-hand relationship was perturbed so that the cursor position was rotated around the center at the start position. The coordinate system is drawn on the left side of the robot (invisible for subject) where the clockwise rotation around the start is positive. The cumulative score of each block was provided to the subject. In the localization task, subjects pointed with their left hand over the screen to the remembered location of their right hand as it crossed the (unseen) target area in the previous trial. In the localization task, the start box was not visible. (B) Experimental paradigms. In ERR, full visual feedback about the cursor position was provided as well as the animation and the sound indicating target explosion regarding success or failure of the task. In EPE, while the cursor was unseen during the shooting movement, it was presented for 200 ms as the hand crossed an imaginary circle with the radius equal to the target, providing endpoint error with respect to the target. The reward signal was also provided as in the ERR condition. In RWD, no visual feedback about the cursor was provided. All information that subjects were able to use was the success or failure of the task. (C) Reach angles of three representative subjects during the adaptation phase. The yellow line in the ERR group is the ideal reach angle, which shifted gradually up to 8 degrees by the visual rotation. The gray area indicates the reward region, which shifted with the same schedule in the three groups. (D) Reach variability in the final 100 trials for each group. There are the significant differences between ERR and EPE (t-test, p<0.003) as well as between EPE and RWD (t-test, p<0.001). (E) Results of the localization task for the three subjects. The reach trajectory is plotted for the POST condition. Red line is for the RWD subject, blue line is for the ERR subject, and green line is for the EPE subject. The circle around the reach trajectory is the averaged pointing location in the localization trial.
Figure 2. The sensory remapping and the…
Figure 2. The sensory remapping and the generalization function.
(A) The average estimated localization of hand position in PRE and POST conditions. Error bars are SEM. (B) Generalization of adaptation from the learned target direction (at 0°) to neighboring target directions. (C) Illusion index (change in estimated location of the hand from PRE to POST adaptation), as a function of generalization index in subjects in EPE condition. Each dot indicates individual subject's data. There are significant negative correlation in these two indices (R = −0.68, p = 0.02).
Figure 3. The theoretical problem of learning…
Figure 3. The theoretical problem of learning motor control.
(A) A generative model of the motor adaptation task. Motor commands are corrupted by a perturbation, which result in a hand position that is sensed via a cursor, and may also result in reward. The objective of the learner is to find the motor commands that maximize reward. White circles are hidden variables and gray circles are observed variables. Arrows indicate conditional probabilities. (B) Model of optimal learner. The learning system is composed of two compensatory mechanisms: action selector and internal forward model. At the trial k, the action selector outputs the motor command to make a transition of the state of the body and task from to . The state variable includes three elements: hand position h, perturbation p, and the position t. The brain observes the part of the state of the body . At the same time, the learner predicts the transition of the body state from the efference copy of the motor command. Kalman filtering correct the prediction to minimize the sensory prediction error to have the updated state . The action selector selects the optimal action as a function of the updated state at the next trial. (C) Sample disturbance and the response of the model. The task is to control the reach angle. Clockwise (CW) direction is positive and the target is at 0°. The uncertainty of the visual feedback was controlled to modulates the Kalman gain. The simulations predict a remapping regarding estimated hand position modulated by the level of visual uncertainty.
Figure 4. Estimated contribution of reward and…
Figure 4. Estimated contribution of reward and sensory prediction errors to change in motor output during adaptation.
When subjects experienced the ERR and EPE condition, we assumed that the motor commands were produced by the sum of two memories, , where was updated by the sensory-prediction error and was updated by the reward prediction error. The best fit parameters predict the update of the two memories. The black think line is the averaged subject's reach angle during the adaptation period. The gray shadow is SEM. The superimposed purple line is the estimated reach angle from the model which is a combination of (red) and (blue). In the RWD condition, the motor commands are updated by only the reward-prediction error: .

References

    1. Synofzik M, Thier P, Lindner A. Internalizing agency of self-action: perception of one's own hand movements depends on an adaptable prediction about the sensory action outcome. J Neurophysiol. 2006;96:1592–1601.
    1. Synofzik M, Lindner A, Thier P. The cerebellum updates predictions about the visual consequences of one's behavior. Curr Biol. 2008;18:814–818.
    1. Baddeley RJ, Ingram HA, Miall RC. System identification applied to a visuomotor task: near-optimal human performance in a noisy changing task. J Neurosci. 2003;23:3066–3075.
    1. Berniker M, Kording K. Estimating the sources of motor errors for adaptation and generalization. Nat Neurosci. 2008;11:1454–1461.
    1. Kording KP, Tenenbaum JB, Shadmehr R. The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat Neurosci. 2007;10:779–786.
    1. Sing GC, Joiner WM, Nanayakkara T, Brayanov JB, Smith MA. Primitives for motor adaptation reflect correlated neural tuning to position and velocity. Neuron. 2009;64:575–589.
    1. van Beers RJ. Motor learning is optimally tuned to the properties of motor noise. Neuron. 2009;63:406–417.
    1. Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci. 2009;12:1062–1068.
    1. Schonberg T, Daw ND, Joel D, O'Doherty JP. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci. 2007;27:12860–12867.
    1. Trommershauser J, Maloney LT, Landy MS. Decision making, movement planning and statistical decision theory. Trends Cogn Sci. 2008;12:291–297.
    1. Kawato M, Gomi H. A computational model of four regions of the cerebellum based on feedback-error learning. Biol Cybern. 1992;68:95–103.
    1. Kawato M. Internal models for motor control and trajectory planning. Curr Opin Neurobiol. 1999;9:718–727.
    1. Thoroughman KA, Shadmehr R. Learning of action through adaptive combination of motor primitives. Nature. 2000;407:742–747.
    1. Smith MA, Ghazizadeh A, Shadmehr R. Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biol. 2006;4:e179.
    1. Pearson TS, Krakauer JW, Mazzoni P. Learning not to generalize: modular adaptation of visuomotor gain. J Neurophysiol. 2010;103:2938–2952.
    1. Shadmehr R. Generalization as a behavioral window to the neural mechanisms of learning internal models. Hum Mov Sci. 2004;23:543–568.
    1. Haswell CC, Izawa J, Dowell LR, Mostofsky SH, Shadmehr R. Representation of internal models of action in the autistic brain. Nat Neurosci. 2009;12:970–972.
    1. Bedford FL. Keeping perception accurate. Trends Cogn Sci. 1999;3:4–11.
    1. Wolpert DM, Ghahramani Z. Computational principles of movement neuroscience. Nat Neurosci. 2000;3(Suppl):1212–1217.
    1. Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature. 2005;437:1158–1161.
    1. Trommershauser J, Gepshtein S, Maloney LT, Landy MS, Banks MS. Optimal compensation for changes in task-relevant movement variability. J Neurosci. 2005;25:7169–7178.
    1. Takikawa Y, Kawagoe R, Itoh H, Nakahara H, Hikosaka O. Modulation of saccadic eye movements by predicted reward outcome. Exp Brain Res. 2002;142:284–291.
    1. Doya K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol. 2000;10:732–739.
    1. Izawa J, Rane T, Donchin O, Shadmehr R. Motor adaptation as a process of reoptimization. J Neurosci. 2008;28:2883–2891.
    1. Izawa J, Shadmehr R. On-line processing of uncertain information in visuomotor control. J Neurosci. 2008;28:11360–11368.
    1. Todorov E, Jordan MI. Optimal feedback control as a theory of motor coordination. Nat Neurosci. 2002;5:1226–1235.
    1. Shadmehr R, Krakauer JW. A computational neuroanatomy for motor control. Exp Brain Res. 2008;185:359–381.
    1. Izawa J, Kondo T, Ito K. Biological arm motion through reinforcement learning. Biol Cybern. 2004;91:10–22.
    1. Poggio T, Fahle M, Edelman S. Fast perceptual learning in visual hyperacuity. Science. 1992;256:1018–1021.
    1. Hwang EJ, Smith MA, Shadmehr R. Adaptation and generalization in acceleration-dependent force fields. Exp Brain Res. 2006;169:496–506.
    1. Tanaka H, Sejnowski TJ, Krakauer JW. Adaptation to visuomotor rotation through interaction between posterior parietal and motor cortical areas. J Neurophysiol. 2009;102:2921–2932.
    1. Tseng YW, Diedrichsen J, Krakauer JW, Shadmehr R, Bastian AJ. Sensory prediction errors drive cerebellum-dependent adaptation of reaching. J Neurophysiol. 2007;98:54–62.
    1. Smith MA, Shadmehr R. Intact ability to learn internal models of arm dynamics in Huntington's disease but not cerebellar degeneration. J Neurophysiol. 2005;93:2809–2821.
    1. Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 2006;7:464–476.
    1. Nakahara H, Doya K, Hikosaka O. Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach. J Cogn Neurosci. 2001;13:626–647.
    1. Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci. 2004;7:887–893.
    1. Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science. 2005;310:1337–1340.
    1. Packard MG, Knowlton BJ. Learning and memory functions of the Basal Ganglia. Annu Rev Neurosci. 2002;25:563–593.
    1. Wickens JR, Reynolds JN, Hyland BI. Neural mechanisms of reward-related motor learning. Curr Opin Neurobiol. 2003;13:685–690.
    1. Dickinson A, Balleine B. Motivational control of goal-directed action. Anima Learn Behave. 1994;22:1–8.
    1. Gabrieli JD, Stebbins GT, Singh J, Willingham DB, Goetz CG. Intact mirror-tracing and impaired rotary-pursuit skill learning in patients with Huntington's disease: evidence for dissociable memory systems in skill learning. Neuropsychology. 1997;11:272–281.
    1. Agostino R, Sanes JN, Hallett M. Motor skill learning in Parkinson's disease. J Neurol Sci. 1996;139:218–226.
    1. Marinelli L, Crupi D, Di Rocco A, Bove M, Eidelberg D, et al. Learning and consolidation of visuo-motor adaptation in Parkinson's disease. Parkinsonism Relat Disord. 2009;15:6–11.
    1. Criscimagna-Hemminger SE, Bastian AJ, Shadmehr R. Size of error affects cerebellar contributions to motor learning. J Neurophysiol. 2010;103:2275–2284.
    1. Harris CM, Wolpert DM. Signal-dependent noise determines motor planning. Nature. 1998;394:780–784.
    1. Jones KE, Hamilton AF, Wolpert DM. Sources of signal-dependent noise during isometric force production. J Neurophysiol. 2002;88:1533–1544.
    1. Faisal AA, Selen LP, Wolpert DM. Noise in the nervous system. Nat Rev Neurosci. 2008;9:292–303.
    1. Churchland MM, Afshar A, Shenoy KV. A central source of movement variability. Neuron. 2006;52:1085–1096.
    1. Burge J, Ernst MO, Banks MS. The statistical determinants of adaptation rate in human reaching. J Vis 8: 20. 2008;21-19
    1. Kording KP, Ku SP, Wolpert DM. Bayesian integration in force estimation. J Neurophysiol. 2004;92:3161–3165.
    1. Sutton R, Barto A. MIT Press; 1998. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning).

Source: PubMed

3
Suscribir