A syntactic approach to robot imitation learning using probabilistic activity grammars
Introduction
Humans are capable of learning novel activity representations despite noisy sensory input by making use of the previously acquired contextual knowledge, since many human activities often share similar underlying structures. For example, when we observe a hand transferring an object to another place where a grasping action cannot be seen due to some occlusions, we can still infer that a grasping action occurred before the object was lifted.
Similarly, in the process of language acquisition, a child learns more complex concepts and represents them by using previously learned vocabularies. Analogously, the structure of an activity can be represented using a formal grammar, where symbols (or vocabularies) represent the smallest meaningful units of action components, i.e. primitive actions. We are interested in learning reusable action components to better understand more complicated tasks that share the same structures under noisy environments.
The learning of reusable action components is one of the crucial tools for robot imitation learning (also called robot programming by demonstration), which has become an important paradigm, as it enables a robot to incrementally learn higher-level knowledge from human teachers. Our approach shares the concept of imitation learning presented in the Handbook of Robotics (Chapter 59) [1], as well as in [2], [3], [4], [5] where a robot learns a new task directly from human demonstration without the need of extensive reprogramming.
There are several important issues in imitation learning: what to imitate, how to imitate, who to imitate, when to imitate and how to judge if imitation was successful [6]. In this paper, we mainly focus on the issue of what to imitate, which is an actively investigated area, where a robot needs to understand the goal or intention of actions, as done similarly in [7], [8], [9], [10], [11]. It is also known that humans tend to interpret actions based on goals rather than motion trajectories [12], [13]. Another active research area, which studies on solving problems of how to imitate, focuses on learning the trajectories of joints (e.g. [14], [15], [16], [17], [18], [19]). Although this is not our main focus, we address this issue in our Dance Imitation experiment (Section 5.3).
We are inspired by the work done in [20] which has the same motivation about hierarchical learning. In their work, the authors designed a set of primitive actions which are then used as building blocks, i.e. basic vocabularies, to represent higher-level activities. However, it does not deal with more complex concepts such as recursions which we will deal with here. In this respect, we choose Stochastic Context-Free Grammars (SCFGs) as our representation framework due to (1) robustness to noise as a result of the probabilistic nature, (2) compactness on representing hierarchical and recursive structures, and (3) generation of human-readable output which can be intuitively interpreted for users even without deep technical knowledge. It is worth noting that “context-free” in SCFG is used as a contrast to “context-sensitive”, which is another type of grammars, i.e. it does not mean that it lacks the contextual knowledge. Although some other commonly used techniques such as Hidden Markov Models (HMMs) require lower computational complexity, they are often relatively less expressive, and cannot easily represent structures with repetitions and recursions. For example, the recursive activity , where , (equal number of Push and Pull operations.), cannot be represented using HMMs. SCFGs extend Context-Free Grammars by adding rule probabilities, a notion similar to state transition probabilities in HMMs. We are especially interested in the real-world applications where noise cannot be avoided. Hence, in our case we consider the symbol probabilities as well as the rule probabilities.
In this paper, we present a method on learning activity grammars from human demonstrations which can be used as a prior to better recognize more complex tasks that share the same underlying components with ambiguity. We assume that (1) the system can detect meaningful atomic actions which are not necessarily noise-free, and (2) extensive complete datasets are not always available but numerous examples of smaller component elements could be found.
Section snippets
Related works
A large amount of effort has been spent to understand tasks using context-free grammars (CFGs). In [21], Ryoo defines a game activity representation using CFGs which enables a system to recognize events and actively provide proper feedback to the human user when the user makes unexpected actions. In [22], Ivanov defines SCFG rules to recognize more complicated actions, e.g. music conducting gestures, using HMM-based low-level action detectors. In [23], a robot imitates human demonstrations
Stochastic context-free grammar induction
A context-free grammar (CFG) is defined by a 4-tuple , where is the set of terminals, is the set of non-terminals, R is the set of production rules, and S is the start symbol. The production rules take the form , where and . Non-terminals are denoted in uppercase letters while terminals are denoted in lowercase letters. In Stochastic CFG (SCFG), also known as Probabilistic CFG (PCFG), each rule production is assigned continuous probability parameters.
To induce an
Proposed method
We first explain our method of computing the rule probabilities in the first section, followed by considering symbols with uncertainty values.
Experiments and analyses
To test our framework, we first experiment on the synthetic data with systematically varying the levels of noise, followed by the real-world data obtained from a camera. As MDL scores depend on the data samples, we compute the ratio values of MDL scores between the learned grammar and the hand-made model grammar.
Discussions and future directions
We have presented a robot imitation learning framework using probabilistic activity grammars. Our method aims to discover reusable common action components across multiple tasks from input stream. We have shown in the two non-trivial real-world experiments (Sections 5.2 The towers of Hanoi, 5.3 Dance imitation learning) that our method is capable to learn reusable structures under reasonable amount of noise, in addition to the synthetic dataset experiment for systematic analysis. In the Dance
Acknowledgments
This work was supported by the EU FP7 projects EFAA (FP7-ICT-270490) and ALIZ-E (FP7-ICT-248116).
Kyuhwa Lee is a Ph.D. student and a research assistant in the Personal Robotics Lab at Imperial College London. His research interests focus upon structured human task learning and active learning for robots using syntactic approaches. He actively works in the fields of robot learning by demonstration with the real world applications on humanoid robots such as iCub and Simon.
References (47)
Teaching and learning of robot tasks via observation of human performance
Robotics and Autonomous Systems
(2004)Is imitation learning the route to humanoid robots?
Trends in Cognitive Sciences
(1999)- et al.
Discerning intentions in dynamic human action
Trends in Cognitive Sciences
(2001) - et al.
Discriminative and adaptive imitation in uni-manual and bi-manual tasks
Robotics and Autonomous Systems
(2006) - et al.
Hierarchical attentive multiple models for execution and recognition of actions
Robotics and Autonomous Systems
(2006) A bibliographical study of grammatical inference
Pattern Recognition
(2005)- et al.
Content-based retrieval for human motion data
Journal of Visual Communication and Image Representation
(2004) - et al.
STRIPS: a new approach to the application of theorem proving to problem solving
Artificial Intelligence
(1972) - et al.
Robot Programming by Demonstration (Chapter 59)
(2008) - et al.
Learning by watching: extracting reusable task knowledge from visual observation of human performance
T. Robotics and Automation
(1994)
Imitation learning based on visuo-somatic mapping
Experimental Robotics IX
The agent-based perspective on imitation
Imitation as a dual-route process featuring predictive and learning components; a biologically plausible computational model
How infants make sense of intentional action
Imitation learning of dual-arm manipulation tasks in humanoid robots
International Journal of Humanoid Robotics
Local gaussian process regression for real time online model learning and control
Online spatio-temporal Gaussian process experts with application to tactile classification
IEEE/RSJ International Conference on Intelligent Robots and Systems
Cited by (0)
Kyuhwa Lee is a Ph.D. student and a research assistant in the Personal Robotics Lab at Imperial College London. His research interests focus upon structured human task learning and active learning for robots using syntactic approaches. He actively works in the fields of robot learning by demonstration with the real world applications on humanoid robots such as iCub and Simon.
Yanyu Su is a Ph.D. student at the State Key Laboratory of Robotics and System at Harbin Institute of Technology, and is currently visiting the Personal Robotics Library at Imperial College London working on biomimetic grasping mechanisms for complex humanoid hands, and robot learning by demonstration.
Tae-Kyun Kim is a Lecturer in the computer vision and learning at the Imperial College London, UK, since 2010. He obtained his Ph.D. from University of Cambridge in 2007 and was a research fellow of Sidney Sussex College in Cambridge during 2007–2010. His research interests span various topics including: object recognition, tracking, face recognition and surveillance, action/gesture recognition and semantic image segmentation and reconstruction. He has co-authored over 40 journal and conference papers, 6 MPEG7 standard documents and 17 international patents. His co-authored algorithm is an international standard of MPEG-7 ISO/IEC for face image retrieval.
Yiannis Demiris is a Reader of Imperial College London. His research interests include assistive robotics, multi-robot systems, robot human interaction and learning by demonstration. Dr. Demiris’ research is funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC), the Royal Society, BAE Systems, and the EU FP7 program through projects ALIZ-E and EFAA, both addressing novel machine learning approaches to human–robot interaction. Dr. Yiannis Demiris has guest edited special issues of the IEEE Transactions on SMC-B specifically on Learning by Observation, Demonstration, and Imitation, and of the Adaptive Behavior Journal on Developmental Robotics. He has organized six international workshops on Robot Learning, BioInspired Machine Learning, Epigenetic Robotics, and Imitation in Animals and Artifacts (AISB), was the chair of the IEEE International Conference on Development and Learning (ICDL) for 2007, as well as the program chair of the ACM/IEEE International Conference on Human–Robot Interaction (HRI) 2008. in 2012 he received the Rector’s award for Teaching Excellence, and the Faculty of Engineering Award for Excellence in Engineering Education. He is a Senior Member of IEEE, and a member of the Institute of Engineering & Technology of Britain (IET).