Jumat, 18 Juli 2008

A Robotic Framework for Semantic Concept Learning

This report describes the study of cognitive development using a constructive approach. The basis of our work can be summarized as follows. We believe that human intelligence, and hence language, is primarily semantic. We believe that the mind forms semantic concepts through the correlation of events and cues close together in time and/or space. We further believe that an integrated sensory-motor system is necessary to ground these concepts and allow the mind to form a semantic representation of reality—there is no such thing as a disembodied mind.

Starting with these ideas, we are developing a robotic platform in ongoing work at the University of Illinois at Urbana-Champaign, complete with basic sensory-motor and computing capabilities. The sensory-motor components are functionally equivalent to their human or animal counterparts, and include binaural hearing, stereo vision, tactile sense, and basic proprioceptive control. On top of these components, our group is implementing various processing and learning models, with the intention of creating and aiding semantic understanding and intelligent behavior. Our goal is to produce a robot that will learn to understand and carry out simple tasks in response to natural language requests.

This technical report describes work on a semantic learning model completed under a Sandia National Laboratories Excellence in Engineering Fellowship. The rest of the report is organized as follows. Section 2 gives a brief overview of our robotic framework. Section 3 introduces hidden Markov models (HMMs) and recursive maximum-likelihood estimation (RMLE), both of which we use as part of our semantic learning model, described in Section 4.

We offer some further results and conclusions in Section 5.

A Robotic Framework for Studying Cognition

We use the cognitive cycle depicted in Fig. 1 to guide the design of our robotic system. This simple diagram shows the flow of cognition among four systems: a sensory system,

an associative memory, a working memory, and a vocalization and motor system. The diagram is reminiscent of ones used by psychologists to describe the human memory system (see e.g., [1], p. 66), with some some additional emphasis on the associative nature of memory and on the embodiment and interaction of the system with the environment. These emphasized areas are key requirements for embodied learning. Below we describe two different views of this cycle: the somatic system view (the body), and the noetic system view (the mind).

2.1 Somatic System

The somatic system is the physical, “body” component of the mind-body system. It is comprised of the physical components necessary for cognition: the senses, muscular (motor) system, nervous system, and the brain.

To do the most human-like cognitive studies, we would like to work with a robot which is as anthropomorphic as possible. For our work, we chose Arrick Robotics’ Trilobot [2] (see Fig. 2). The robot’s anthropomorphic capabilities are rich enough to suit our purposes.

In particular, the robot can move freely via wheels, can move its head, and use its arm to manipulate common objects, allowing relatively complex behaviors. A speaker is available on-board for the production of sounds and, with additional processing, speech.

We have added cameras and microphones to the robot to give it stereo vision and hearing capabilities, and have implemented, in software, some basic audio and visual processing and feature extractors to mimic aspects of these systems [3–8]. The robot also has a number of touch and other sensors available. We have incorporated a computer on-board which collects input from the cameras, microphones, and sensors, and sends control commands to the robot. The computer can also handle limited processing of the data, but a wireless transmitter is available to transmit the data to other workstations, where most processing occurs. This distributed system of computers houses the “brain” of our robot. To facilitate the communications necessary for this system, we did extensive design and coding of a distributed communications and processing framework early in this research. See [3] for details.

2.2 Noetic System

The noetic system in Fig. 1 represents the “mind” aspect in the mind-body paradigm. The main goal of our research is to implement functional equivalents for high-level cognitive functions in this area. We can characterize this goal by looking at three different aspects of the mind: memory, learning, and behavior.

Memory is often described hierarchically, dividing first into short-term memory and long-term memory. Long-term memory is further divided into procedural, semantic, and episodic memories [1,9,10]. While they are all connected and interrelated, our interest here is on semantic memory—our knowledge and understanding of the world. As indicated in the introduction, we believe that memory is primarily associative.

Learning can be described informally as a transition from one mental state to another where information is gained [11]. If memory is primarily assocative, then learning must principally involve the formation of associations. According to D. Shanks, in associative learning, “the environment provides a relationship among contingent events, allowing [a]

person to predict one [event] in the presence of others.” [11] Possible events include both environmental cues and the subject’s own behavior. The relationship between or among events can be causal or structural. In causal relationships, one event occurs, followed by another, perhaps after a brief time interval. For example, there is a consistent causal relationship between touching a hot burner and feeling pain. Structural relationships relate features or properties of an object or event with other features which frequently co-occur.

For example, after both seeing and smelling a fire, the presense of one of these events generally indicates the presence of the other. A less obvious example of a structural relationship is the association of a word with a particular object or event, a key focus of our research. This type of association allows the formation of symbolic concepts, permitting symbolic manipulation.

If memory contains our knowledge about the world, and learning modifies that knowledge, behavior puts that knowledge into use. While behavioral expression is an integral component of our long term research goals, it is not as immediately important to the research described herein.

The work described in the next few sections describes our composite HMM for associative learning. It consists of a cascade of hidden Markov models (HMMs), with models lower in the cascade responsible for learning low-level sensory-motor concepts, and models higher in the cascade responsible for learning higher concepts. The next section gives a formal description of HMMs and briefly describes an on-line learning algorithm for training them.