A syntactic approach to robot imitation learning using probabilistic activity grammars

https://doi.org/10.1016/j.robot.2013.08.003Get rights and content

Highlights

  • We present a syntactic approach to robot imitation learning.

  • It captures reusable task structures in the form of probabilistic activity grammars.

  • We aim to learn with a reasonably small number of samples under noisy conditions.

  • We evaluate on both synthetic and two real-world humanoid robot experiments.

  • Our method shows improvement on imitation learning when compared with other methods.

Abstract

This paper describes a syntactic approach to imitation learning that captures important task structures in the form of probabilistic activity grammars from a reasonably small number of samples under noisy conditions. We show that these learned grammars can be recursively applied to help recognize unforeseen, more complicated tasks that share underlying structures. The grammars enforce an observation to be consistent with the previously observed behaviors which can correct unexpected, out-of-context actions due to errors of the observer and/or demonstrator. To achieve this goal, our method (1) actively searches for frequently occurring action symbols that are subsets of input samples to uncover the hierarchical structure of the demonstration, and (2) considers the uncertainties of input symbols due to imperfect low-level detectors.

We evaluate the proposed method using both synthetic data and two sets of real-world humanoid robot experiments. In our Towers of Hanoi experiment, the robot learns the important constraints of the puzzle after observing demonstrators solving it. In our Dance Imitation experiment, the robot learns 3 types of dances from human demonstrations. The results suggest that under reasonable amount of noise, our method is capable of capturing the reusable task structures and generalizing them to cope with recursions.

Introduction

Humans are capable of learning novel activity representations despite noisy sensory input by making use of the previously acquired contextual knowledge, since many human activities often share similar underlying structures. For example, when we observe a hand transferring an object to another place where a grasping action cannot be seen due to some occlusions, we can still infer that a grasping action occurred before the object was lifted.

Similarly, in the process of language acquisition, a child learns more complex concepts and represents them by using previously learned vocabularies. Analogously, the structure of an activity can be represented using a formal grammar, where symbols (or vocabularies) represent the smallest meaningful units of action components, i.e. primitive actions. We are interested in learning reusable action components to better understand more complicated tasks that share the same structures under noisy environments.

The learning of reusable action components is one of the crucial tools for robot imitation learning (also called robot programming by demonstration), which has become an important paradigm, as it enables a robot to incrementally learn higher-level knowledge from human teachers. Our approach shares the concept of imitation learning presented in the Handbook of Robotics (Chapter 59)  [1], as well as in  [2], [3], [4], [5] where a robot learns a new task directly from human demonstration without the need of extensive reprogramming.

There are several important issues in imitation learning: what to imitate, how to imitate, who to imitate, when to imitate and how to judge if imitation was successful  [6]. In this paper, we mainly focus on the issue of what to imitate, which is an actively investigated area, where a robot needs to understand the goal or intention of actions, as done similarly in  [7], [8], [9], [10], [11]. It is also known that humans tend to interpret actions based on goals rather than motion trajectories  [12], [13]. Another active research area, which studies on solving problems of how to imitate, focuses on learning the trajectories of joints (e.g.  [14], [15], [16], [17], [18], [19]). Although this is not our main focus, we address this issue in our Dance Imitation experiment (Section  5.3).

We are inspired by the work done in  [20] which has the same motivation about hierarchical learning. In their work, the authors designed a set of primitive actions which are then used as building blocks, i.e. basic vocabularies, to represent higher-level activities. However, it does not deal with more complex concepts such as recursions which we will deal with here. In this respect, we choose Stochastic Context-Free Grammars (SCFGs) as our representation framework due to (1) robustness to noise as a result of the probabilistic nature, (2) compactness on representing hierarchical and recursive structures, and (3) generation of human-readable output which can be intuitively interpreted for users even without deep technical knowledge. It is worth noting that “context-free” in SCFG is used as a contrast to “context-sensitive”, which is another type of grammars, i.e. it does not mean that it lacks the contextual knowledge. Although some other commonly used techniques such as Hidden Markov Models (HMMs) require lower computational complexity, they are often relatively less expressive, and cannot easily represent structures with repetitions and recursions. For example, the recursive activity anbn, where a=Push, b=Pull (equal number of Push and Pull operations.), cannot be represented using HMMs. SCFGs extend Context-Free Grammars by adding rule probabilities, a notion similar to state transition probabilities in HMMs. We are especially interested in the real-world applications where noise cannot be avoided. Hence, in our case we consider the symbol probabilities as well as the rule probabilities.

In this paper, we present a method on learning activity grammars from human demonstrations which can be used as a prior to better recognize more complex tasks that share the same underlying components with ambiguity. We assume that (1) the system can detect meaningful atomic actions which are not necessarily noise-free, and (2) extensive complete datasets are not always available but numerous examples of smaller component elements could be found.

Section snippets

Related works

A large amount of effort has been spent to understand tasks using context-free grammars (CFGs). In  [21], Ryoo defines a game activity representation using CFGs which enables a system to recognize events and actively provide proper feedback to the human user when the user makes unexpected actions. In  [22], Ivanov defines SCFG rules to recognize more complicated actions, e.g. music conducting gestures, using HMM-based low-level action detectors. In  [23], a robot imitates human demonstrations

Stochastic context-free grammar induction

A context-free grammar (CFG) is defined by a 4-tuple G={Σ,N,S,R}, where Σ is the set of terminals, N is the set of non-terminals, R is the set of production rules, and S is the start symbol. The production rules take the form Xλ, where XN and λ(NΣ). Non-terminals are denoted in uppercase letters while terminals are denoted in lowercase letters. In Stochastic CFG (SCFG), also known as Probabilistic CFG (PCFG), each rule production is assigned continuous probability parameters.

To induce an

Proposed method

We first explain our method of computing the rule probabilities in the first section, followed by considering symbols with uncertainty values.

Experiments and analyses

To test our framework, we first experiment on the synthetic data with systematically varying the levels of noise, followed by the real-world data obtained from a camera. As MDL scores depend on the data samples, we compute the ratio values of MDL scores between the learned grammar and the hand-made model grammar.

Discussions and future directions

We have presented a robot imitation learning framework using probabilistic activity grammars. Our method aims to discover reusable common action components across multiple tasks from input stream. We have shown in the two non-trivial real-world experiments (Sections  5.2 The towers of Hanoi, 5.3 Dance imitation learning) that our method is capable to learn reusable structures under reasonable amount of noise, in addition to the synthetic dataset experiment for systematic analysis. In the Dance

Acknowledgments

This work was supported by the EU FP7 projects EFAA (FP7-ICT-270490) and ALIZ-E (FP7-ICT-248116).

Kyuhwa Lee is a Ph.D. student and a research assistant in the Personal Robotics Lab at Imperial College London. His research interests focus upon structured human task learning and active learning for robots using syntactic approaches. He actively works in the fields of robot learning by demonstration with the real world applications on humanoid robots such as iCub and Simon.

References (47)

  • M. Asada et al.

    Imitation learning based on visuo-somatic mapping

    Experimental Robotics IX

    (2006)
  • K. Dautenhahn et al.

    The agent-based perspective on imitation

  • J. Demiris et al.

    Imitation as a dual-route process featuring predictive and learning components; a biologically plausible computational model

  • S. Calinon, F. Guenter, A. Billard, Goal-directed imitation in a humanoid robot, in: IEEE International Conference on...
  • D.C. Bentivegna, C.G. Atkeson, G. Cheng, Learning similar tasks from observation and practice, in: IEEE/RSJ...
  • K. Lee, J. Lee, A. Thomaz, A. Bobick, Effective robot task learning by focusing on task-relevant objects, in: IEEE/RSJ...
  • C. Chao, M. Cakmak, A.L. Thomaz, Towards grounding concepts for transfer in goal learning from demonstration, in: IEEE...
  • A. Woodward et al.

    How infants make sense of intentional action

  • T. Asfour et al.

    Imitation learning of dual-arm manipulation tasks in humanoid robots

    International Journal of Humanoid Robotics

    (2008)
  • Y. Wu, Y. Demiris, Towards one shot learning by imitation for humanoid robots, in: IEEE International Conference on...
  • D. Nguyen-tuong et al.

    Local gaussian process regression for real time online model learning and control

  • S. Gurbuz, T. Shimizu, G. Cheng, Real-time stereo facial feature tracking: mimicking human mouth movement on a humanoid...
  • H. Soh et al.

    Online spatio-temporal Gaussian process experts with application to tactile classification

    IEEE/RSJ International Conference on Intelligent Robots and Systems

    (2012)
  • Cited by (0)

    Kyuhwa Lee is a Ph.D. student and a research assistant in the Personal Robotics Lab at Imperial College London. His research interests focus upon structured human task learning and active learning for robots using syntactic approaches. He actively works in the fields of robot learning by demonstration with the real world applications on humanoid robots such as iCub and Simon.

    Yanyu Su is a Ph.D. student at the State Key Laboratory of Robotics and System at Harbin Institute of Technology, and is currently visiting the Personal Robotics Library at Imperial College London working on biomimetic grasping mechanisms for complex humanoid hands, and robot learning by demonstration.

    Tae-Kyun Kim is a Lecturer in the computer vision and learning at the Imperial College London, UK, since 2010. He obtained his Ph.D. from University of Cambridge in 2007 and was a research fellow of Sidney Sussex College in Cambridge during 2007–2010. His research interests span various topics including: object recognition, tracking, face recognition and surveillance, action/gesture recognition and semantic image segmentation and reconstruction. He has co-authored over 40 journal and conference papers, 6 MPEG7 standard documents and 17 international patents. His co-authored algorithm is an international standard of MPEG-7 ISO/IEC for face image retrieval.

    Yiannis Demiris is a Reader of Imperial College London. His research interests include assistive robotics, multi-robot systems, robot human interaction and learning by demonstration. Dr. Demiris’ research is funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC), the Royal Society, BAE Systems, and the EU FP7 program through projects ALIZ-E and EFAA, both addressing novel machine learning approaches to human–robot interaction. Dr. Yiannis Demiris has guest edited special issues of the IEEE Transactions on SMC-B specifically on Learning by Observation, Demonstration, and Imitation, and of the Adaptive Behavior Journal on Developmental Robotics. He has organized six international workshops on Robot Learning, BioInspired Machine Learning, Epigenetic Robotics, and Imitation in Animals and Artifacts (AISB), was the chair of the IEEE International Conference on Development and Learning (ICDL) for 2007, as well as the program chair of the ACM/IEEE International Conference on Human–Robot Interaction (HRI) 2008. in 2012 he received the Rector’s award for Teaching Excellence, and the Faculty of Engineering Award for Excellence in Engineering Education. He is a Senior Member of IEEE, and a member of the Institute of Engineering & Technology of Britain (IET).

    View full text