We developed a model of the basal ganglia (Piron et al., 2016) that introduces an action selection mechanism that is based upon the competition between a positive feedback through the direct pathway and a negative feedback through the hyperdirect pathway. The model also exploits the parallel organization of circuits between the basal ganglia and the cortex using segregated loops: one loop allows to choose the cue and one loop allows to make the actual motor selection. Learning occurs between the cognitive cortex and the cognitive striatum using a simple reinforcement learning where the values of the different cues are updated after each decision. As in most computational models of the basal ganglia, this model relies on an actor-critic architecture where the dopamine signal is used to encode the temporal difference prediction error signal in the critic (Joel et al., 2002; Khamassi et al., 2005). However, this algorithm is not very elaborated and its implementation is not biologically plausible since values are stored outside the model.

The objectives of this postdoc is thus to review and to re-implement (Python) main actor-critic models of the literature in order to compare them on a common set of decision tasks (two-arm bandit task for example) in terms of biological plausibility and performances. Special attention will be given to the "Primary Value and Learned Value Pavlovian Learning Algorithm" model (O'Reilly, 2007) and the AGREL model (Roefselma et al. 2005). From these replications (that will be published in ReScience), the most plausible and compatible mechanisms will be implemented in our own model of the basal ganglia in order to replace the current reinforcement learning algorithm (Guthrie et al., 2013, Piron et al. 2016).

Profiles

References
• Joel, D., Niv, Y., & Ruppin, E. (2002). Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks,
15(4-6), 535–547.
• Khamassi, M., Lachèze, L., Girard, B., Berthoz, A., Guillot, A. (2005). Actor-critic models of reinforcement learning in the basal ganglia: From natural to artificial rats. Adaptive Behavior, 13 (2).
• Guthrie, M., Leblois, A., Garenne, A., & Boraud, T. (2013). Interaction between cognitive and motor cortico-basal ganglia loops during decision making: a computational study. Journal of Neurophysiology, 109(12).
• C. Piron, D. Kase, M. Topalidou, M. Goillandeau, H. Orignac, T. N'Guyen, N.P. Rougier, T. Boraud, The globus pallidus pars interna in goal oriented and habitual behavior. Resolving an old standing paradox, Movement Disorders, (2016), to appear.
• O'Reilly, R. C., Frank, M. J., Hazy, T. E., & Watz, B. (2007). PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm. Behavioral
Neuroscience, 121(1).
• Roelfsema, P.R., van Ooyen A. (2005). Attention-gated reinforcement learning of internal representations for classification. Neural Computation.