LOCEN Research Topic: Dynamic movement primitives and reinforcement learning in robots: generalisation and compositionality | Istituto di Scienze e Tecnologie della Cognizione

Research topic

The performance of flexible behaviour to accomplish multiple goals requires ahierarchical organisation of actions. Indeed, each action consists in a sensorimo-tor mapping that associates a flow of motor commands to the flow of sensoryinputs and the mappings related to different actions can be substantially different. When this happens, different actions have to be encoded in a distinctiveportions of the architecture of the control system to avoid a cross-talk or catastrofic interference between them (French, 1999; McCloskey and Cohen, 1989). At thesame time, when such sensorimotor mappings are very similar their encodingin common structures facilitate generalisation and the reuse of knowledge for the accomplishment of different task (Meunier et al., 2010; Singh, 1992). The hierarchical organisation of behaviour also allows the chunking of pieces ofbehaviour in that it allows the representation of an action in a specific portionof the whole controller and the abstract reference to it through labels or pointers (Bakker and Schmidhuber, 2004; Balleine and Dickinson, 1998). This opens up the possibility of processing and reasoning (e.g., as in goal-directed behaviour and planning) about actions in more efficient ways.

Hierarchical architectures are becoming very important in robotics, in particular when robots are requested to solve not only one task but multiple ones,and not only in one condition but in multiple ones (Schembri et al., 2007). Hierarchical architectures are now seen by many as the necessary condition to scale up robots to real-life realistic problems involving multipe tasks (e.g., Demiris and Khadhouri, 2006; Yamashita and Tani, 2008), or to allow robots to undergo a cumulative autonomous development (Baldassarre and Mirolli, 2010; Baldassarre and Mirolli, 2013; Baldassarre and Mirolli, 2013b). Indeed, hierarchical architectures are seen as the means through which robots can decompose the overall control problem they face into smaller tractable problems (Hart and Grupen, 2011), so implementing the powerful ‘divide-and-conquer’ golden principle of engineering at the architectural level. The state-of-art robotics applies this principle to solve multiple tasks, to facilitate the re-use of acquired knowledge to solve other tasks, to facilitate human-to-robot transfer of knowledge, to avoid interference, and so on.

The goal of this research will be mainly technological, i.e. directed to build architectures that allow a robot to learn multiple skills in a cumulative fashion. Howeer, this reserach will be also inspired and informed by knowledge obtained by neuroscience on how brains of real organisms succeed in solving this problem. Indeed, also behaviour and brains of animals are organised in a sophisticated hierarchical fashion (Meunier et al., 2010; Baldassarre, 2002; Baldassarre et al., 2013). Animals’ brains exploit hierarchy to chunck pieces of behaviour (Graybiel, 1998) so as to reuse them in new tasks, to easily recall them in later times as in goal-directed behaviour (i.e., to pursue goals associated with them, Redgrave and Gurney, 2006), to avoid cross-talk, and to exploit the compositionality allowed by a modular organisation of information (Graziano, 2006). Recent research on brain is revealing that it is organised at multiple interdependent levels both within cortical (Miller and Cohen, 2001) and sub-cortical regions (Yin and Knowlton, 2006). In this respect, it can be said that much of the behavioural flexibility exhibited by real organisms depends on the fine hierarchical organisation of the underlying brain structure (Meunier et al., 2010). This offers the opportunity of "copy nature's principles" to produce robotic architectures having the same flexibility of organisms.

Hierarchical reinforcement learning, in particular the option framework, has been the first classic approach used to tackled in an effective way the problem of how building a hirarchical architecture capable of learning to solve multiple tasks and to re-use "skills'' (there called "options"; Barto and Mahadevan, 2003). In general, reinforcement learning is still the field of machine learning that is giving the most important contributions to build hierarchical controllers for robots. The approach based on the option framework, however, has encountered important limitations in solving tasks involving continuous sensory and motor representations as those involved in robotic tasks. For this reason, in the recent years there has been an important "revolution" in the field involving both the way skills are represented and the algorithms used to train them. In terms of skill representations, the classic policies of reinforcement learning have been replaced by dynamic movement primitives (DMPs; Ijspeert, Nakanishi and Schaal, 2002; Ciancio et al. (2013)), which are parameterised dynamic systems capable of producing whole potentially useful dynamic movements of the robots. These movements can be either discrete, i.e. point-to-point and following a given trajectory, or rhythmic, i.e. following a given trajectory multiple times in time. The use of DMPs has allowed the increase of the level of abstraction of motor output (skills), thus facilitating learning. In terms of algorithms, new algorithms have been proposed that are better suited to search the parameters of DMPs, such as policy gradient (Peters and Schaal 2008) and policy search (Kober and Peters, 2011) reinforcement learning methods. These algorithms speed up the learning processes of robots (and allow one to inject knokwledge in the initial DMPs parameters through techniques to "imitate" human demonstrators' movements, see Schaal, Ijspeert and Billard, 2003)) thus making possible to solve tasks directly within real robots.

Research specific problems

What specific architectural principles can be used to allow a robot to learn to solve multiple tasks in a cumulative fashion?
Specifically, how can we exploit dynamic movement primitives (DMPs) to such purpose?
How can we organise those DMPs in a hierarchical fashion, e.g. on the basis of a higher-level "selector" selecting the DMPs depending on the task and context?
What learning algorithms can we use to train the dynamic movement primitives and the selector?

Research method

Machine learning and robotics.

Examples of research of this type carried out by the group

(see full references below; the pdf files of the papers are retrievable from here)

Baldassarre (2002).
Schembri et al. (2007).
Tommasino et al. (2012).
Baldassarre and Mirolli (2013).
Ciancio et al. (2013).
Caligiore et al. (2014).

Requested motivations of the candidate

Strong interest in the topic and motivation to carry out research on it (very important)
Desire to acquire the knowledge and methods of the group
Professionality, reliability.

Requested knowledge of the candidate

Basics of mathematics, in particular calculus and linear algebra.
Possibly: machine learning, in particular neural networks and their learning algorithms.
Ideal: experience with robot controllers.

Requested skills of the candidate

Capacity to program in C++
Potential to learn to develop robotic systems and control architectures
Capacity to read and understand scientific papers in English
Capacity to contribute to write reports in English

References

Bakker, B.; Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In Groen, F., Amato, N., Bonarini, A., Yoshida, E., and Kruse, B., editors, Proceedings of the 8-th Conference on Intelligent Autonomous Systems (IAS-8), pages 438–445.
Baldassarre, G. (2002). A modular neural-network model of the basal ganglia's role in learning and selecting motor behaviours. Journal of Cognitive Systems Research, 3 (2), 5-13.
Baldassarre, G. (2011). What are intrinsic motivations? A biological perspective. In IEEE ICDL 2011, e1-8.
Baldassarre, G., Caligiore, D., Mannella, F. (2013). The hierarchical organisation of cortical and basal-ganglia systems: a computationally-informed review and integrated hypothesis. In Baldassarre, G., Mirolli, M. (ed.), Computational and Robotic Models of the Hierarchical Organisation of Behaviour. 237-270. Berlin: Springer-Verlag.
Baldassarre, G., Mirolli, M. (eds.)(2013). Computational and Robotic Models of the Hierarchical Organisation of Behaviour. Berlin: Springer-Verlag.
Baldassarre, G., Mirolli, M. (eds.)(2013b). Intrinsically motivated learning in natural and artificial systems. Berlin: Springer.
Baldassarre, G., Mirolli, M. (2010). What are the key open challenges for understanding the autonomous cumulative learning of skills? The Newsletters of the Autonomous Mental Development Technical committee (IEEE CIS AMD Newsletters). Vol. 7 N. 1, pp. 11.
Balleine, B. W., Dickinson, A. (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology, 37(4-5):407–419.
Barto, A. G., Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13 (4), 341-379.
Caligiore, D., Tommasino, P., Sperati, V., Baldassarre, G. (2014). Modular and hierarchical brain organization to understand assimilation, accommodation and their relation to autism in reaching tasks: a developmental robotics hypothesis. Adaptive Behavior, e1-26.
Ciancio, A. L., Zollo, L., Baldassarre, G., Caligiore, D., Guglielmelli, E. (2013). The Role of Learning and Kinematic Features in Dexterous Manipulation: a Comparative Study with Two Robotic Hands. International Journal of Advanced Robotic Systems, 10, e1-21.
Demiris, Y., Khadhouri, B. (2006). Hierarchical attentive multiple models for execution and recognition of actions. Robotics and autonomous systems, 54(5):361–369.
French (1999). Catastrophic forgetting in connectionist networks. Trends Cogn Sci, 3(4):128–135.
Graybiel, A. M. (1998). The basal ganglia and chunking of action repertoires. Neurobiol Learn Mem, 70(1-2):119–136.
Graziano, M. (2006). The organization of behavioral repertoire in motor cortex. Annu Rev Neurosci, 29:105–134.
Hart, S., Grupen, R. (2011). Learning generalizable control programs. IEEE Transactions on Autonomous Mental Development, 3(1):216–231.
Ijspeert, A., Nakanishi, J., Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. Advances in neural information processing systems, 15 1523-1530.
Kober, J., Peters, J. (2011). Policy search for motor primitives in robotics. Machine Learning, 84 (1-2), 171-203.
McCloskey, M., Cohen, N. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In Bower, G. H., editor, The psychology of learning and motivation, volume 24, pages 109–165. Academic Press, San Diego, CA.
Meunier, D., Lambiotte, R., and Bullmore, E. T. (2010). Modular and hierarchically modular organization of brain networks. Front Neurosci, 4:200.
Miller, E. K., Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annu Rev Neurosci, 24:167–202.
Peters, J., Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71 (7), 1180-1190.
Redgrave, P. and Gurney, K. (2006). The short-latency dopamine signal: a role in discovering novel actions? Nature Reviews Neuroscience, 7(12):967–975.
Schembri, M.; Mirolli, M. & Baldassarre, G. (2007). Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot. In Demiris, Y.; Scassellati, B. & Mareschal, D. (ed.), Proceedings of the 6th IEEE International Conference on Development and Learning (ICDL2007). 282-287. London: Imperial College.
Schaal, S., Ijspeert, A., Billard, A. (2003). Computational approaches to motor learning by imitation. Philos Trans R Soc Lond B Biol Sci, 358 (1431), 537-547.
Singh, S. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8(3):323–339.
Tommasino, P.; Caligiore, D.; Mirolli, M. & Baldassarre, G. (2012). Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks. In Movellan, J. & Schlesinger, M. (ed.), IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob2012). e1-8. Piscataway, NJ: IEEE. (San Diego, CA 7-9 November 2012).
Yamashita, Y.; Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS computational biology, 4(11): e1000220.
Yin, H. H.; Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nat Rev Neurosci, 7(6):464–476.

Laboratory of Computational Embodied Neuroscience