A Few Days of A Robot's Life in the Human's World: Toward Incremental Individual Recognition
PhD thesis
This thesis presents an integrated framework and implementation for Mertz, an expressive robotic creature for exploring the task of face recognition through natural interaction in an incremental and unsupervised fashion. The goal of this thesis is to advance toward a framework which would allow robots to incrementally ``get to know'' a set of familiar individuals in a natural and extendable way. This thesis is motivated by the increasingly popular goal of integrating robots in the home. In order to be effective in human-centric tasks, the robots must be able to not only recognize each family member, but also to learn about the roles of various people in the household.In this thesis, we focus on two particular limitations of the current technology. Firstly, most of face recognition research concentrate on the supervised classification problem. Currently, one of the biggest problems in face recognition is how to generalize the system to be able to recognize new test data that vary from the training data. Thus, until this problem is solved completely, the existing supervised approaches may require multiple manual introduction and labelling sessions to include training data with enough variations. Secondly, there is typically a large gap between research prototypes and commercial products, largely due to lack of robustness and scalability to different environmental settings.In this thesis, we propose an unsupervised approach which wouldallow for a more adaptive system which can incrementally update thetraining set with more recent data or new individuals over time.Moreover, it gives the robots a more natural {\em socialrecognition} mechanism to learn not only to recognize each person'sappearance, but also to remember some relevant contextualinformation that the robot observed during previous interactionsessions. Therefore, this thesis focuses on integrating anunsupervised and incremental face recognition system within aphysical robot which interfaces directly with humans through naturalsocial interaction. The robot autonomously detects, tracks, andsegments face images during these interactions and automaticallygenerates a training set for its face recognition system. Moreover,in order to motivate robust solutions and address scalabilityissues, we chose to put the robot, Mertz, in unstructured publicenvironments to interact with naive passersby, instead of with onlythe researchers within the laboratory environment.While an unsupervised and incremental face recognition system is acrucial element toward our target goal, it is only a part of thestory. A face recognition system typically receives eitherpre-recorded face images or a streaming video from a static camera.As illustrated an ACLU review of a commercial face recognitioninstallation, a security application which interfaces with thelatter is already very challenging. In this case, our target goalis a robot that can recognize people in a home setting. Theinterface between robots and humans is even more dynamic. Both therobots and the humans move around.We present the robot implementation and its unsupervised incremental face recognition framework. We describe analgorithm for clustering local features extracted from a large set of automatically generated face data. We demonstrate the robot's capabilities and limitations in a series of experiments at a public lobby. In a final experiment, the robot interacted with a few hundred individuals in an eight day period and generated a training set of over a hundred thousand face images. We evaluate the clustering algorithm performance across a range of parameters on this automatically generated training data and also the Honda-UCSD video face database. Lastly, we present some recognition results using the self-labelled clusters.