“Robot’s Delight – A Lyrical Exposition on Learning by Imitation from Human-Human Interaction” is a video submission that won Best Video at the 2017 ACM/IEEE International Conference on Human-Robot Interaction (HRI 2017). The team also provides an in-depth explanation of the techniques and robotics in the video.
Although social robots are growing in popularity and technical feasibility, it is still unclear how we can effectively program social behaviors. There are many difficulties in programming social robots — we need to design hundreds or thousands of dialogue rules, anticipate situations the robot will face, handle common recognition errors, and program the robot to respond to many variations of human speech and behavior. Perhaps most challenging is that we often do not understand the reasoning behind our own behavior and so it is hard to program such implicit knowledge into robots.
In this video, we present two studies exploring learning-by-imitation from human-human interaction. In these studies, we developed techniques for learning typical actions and execution logic directly from people interacting naturally with each other. We propose that this is more scalable and robust than developing interaction logic by hand, and it requires much less effort.
In the first study, we asked participants to role-play a shopkeeper and a customer in a camera shop scenario, and we recorded their motion and speech in 178 interactions. By extracting typical motion and speech actions using unsupervised clustering algorithms, we created a set of robot behaviors and trained a machine learning classifier to predict which of those actions a human shopkeeper would have performed in any given situation. Using this classifier, we programmed a Robovie robot to imitate the movement and speech behavior of the shopkeeper, e.g. greeting the customer, answering questions about camera features, and introducing different cameras.
Experiment results showed that our techniques enabled the robot to perform correct behaviors 84.8% of the time, which was particularly interesting since speech recognition was only 76.8% accurate. This illustrates the robustness of the system to sensor noise, which is one advantage of using noisy, real-world data for training.
In the second study, we used a similar technique to train the android ERICA to imitate people’s behavior in a travel agent scenario. In this case, the challenge was to model the topic of the interaction so that ERICA could learn to answer ambiguous questions like “how much does this package cost?”.
We did this by observing that utterances in the same topic tend to occur together in interactions, so we calculated co-occurrence metrics similar to those used in product recommendation systems for online shopping sites. Using these metrics, we were able to cluster the customer and shopkeeper actions into topics, and these topics were used to improve ERICA’s predictor in order to answer ambiguous questions.
In both of these studies, we illustrated a completely hands-off approach to developing robot interaction logic – the robots learned only from example data of people interacting with each other, and no designer or programmer was needed! We think scalable, data-driven techniques like these promise to be powerful tools for developing even richer, more humanlike interaction logic for robots in the future.
The extended abstract for this video can be found here.
For the full details of the Robovie study, please see our IEEE Transactions on Robotics paper.
Phoebe Liu, Dylan F. Glas, Takayuki Kanda, and Hiroshi Ishiguro,Data-Driven HRI: Learning Social Behaviors by Example from Human-Human Interaction, in IEEE Transactions on Robotics, Vol. 32, No. 4, pp. 988-1008, 2016.