Download - Seminar Arun

Transcript
  • 8/2/2019 Seminar Arun

    1/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 1 VJEC,CHEMPERI

    CHAPTER 1

    INTRODUCTION

    The development of robots has been significant in production, including factories. The expectation

    is high for the development of intelligent robot systems that work cooperatively with human beings in

    daily life and in medical treatment and welfare. Human robot interaction is essential for the operation of

    robots by people. Anyone can operate robots with ease by giving commands to the robot using gestures,

    just as people communicate with gestures. An intelligent manipulator system using gesture recognition has

    been developed. The omnidirectional image is used for the robot control system based on hand gestures

    The communication robot system based on stereo vision and voice instructions was developed. The

    control algorithm for a service robot through the hand over task has been proposed. This paper discussed a

    human robot interaction based on the handshaking action. We developed a communication robot

    HAKUEN that is composed of a multimedia robot with stereo camera, a wheel type mobile robot and a

    PC with a microphone. The HAKUEN approaches and holds out its hand toward the operator according to

    the voice command. The HAKUEN detects the operator's face based on the pixel values of the flesh tint in

    the color image. We use the disparity in order to calculate the distance between the robot and the operator.

    The effectiveness of our system is clarified by several experimental results

  • 8/2/2019 Seminar Arun

    2/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 2 VJEC,CHEMPERI

    CHAPTER 2

    LITERTURE SURVAY

    UNECE issues its 2004 World Robotics survey

    Worldwide investment in industrial robots up 19% in 2003. In first half of 2004, orders for

    robots were up another 18% to the highest level ever recorded. Worldwide growth in the period 2004-

    2007 forecast at an average annual rate of about 7%. Over 600,000 household robots in use - several

    millions in the next few years. From the above press release we can easily realize that household (service)

    robots getting popular. This gives the researcher more interest to work with service robots to make it

    more user friendly to the social context. Speech Recognition (SR) technology gives the researcher the

    opportunity to add Natural language (NL) communication with robot in natural and even way in the social

    context. So the promise of robot that behave more similar to humans (at least from the perception-

    response point of view) is starting to become a reality [28]. Brooks research [5] is also an example of

    developing humanoid robot and raised some research issues. Form these issues; one of the important

    issues is to develop machine that have human-like perception.

  • 8/2/2019 Seminar Arun

    3/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 3 VJEC,CHEMPERI

    CHAPTER 3

    ABOUT ROBOT

    The term robot generally connotes some anthropomorphic (human-like) appearance consider

    robot arms for welding . The main goal robotic is to make Robot workers, which can smart enough to

    replace human from labor work or any kind of dangerous task that can be harmful for human. The idea of

    robot made up mechanical parts came from the science fiction. Three classical films, Metropolis (1926),

    The Day the Earth Stood Still (1951), and Forbidden Planet (1956), cemented the connotation that robots

    were mechanical in origin, ignoring the biological origins in Capeks play. To work as a replacement of

    human robot need some Intelligence to do function autonomously. AI (Artificial intelligence) gives us the

    opportunity to fulfill the intelligent requirement in robotics. There are three paradigms are followed in AI

    robotics depends on the problems. These are - Hierarchical, Reactive, and Hybrid deliberative/reactive.

    Applying the right paradigm makes problem solving easier . Depending on three commonly acceptedrobotic primitives the overview of three paradigms of robotics on Figure 2.1.

    In our project we follow Hybrid reactive paradigm to solve our robotic

    Fig 3.1: Three paradigms a) Hierachical b) Reactive c) Hybrid reactive

  • 8/2/2019 Seminar Arun

    4/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 4 VJEC,CHEMPERI

    CHAPTER 4

    ROBOT CONSTRUCTION

    We developed a communication robot HAKUEN that is shown in Figure1. This system is

    composed of a multimedia robot with stereo camera, a wheel type mobile robot and a PC with a

    microphone. The HAKUEN has two arms and each arm has six degrees of freedom of motion. The head

    of the multimedia robot has two degrees of freedom of motion. The several LEDs are equipped around the

    robot's eyes. The base of the robot is a two wheels mobile robot. When the operator gives the voice

    command to the HAKUEN, the robot approaches and holds out its hand toward the operator. The

    HAKUEN moves according to the operator's voice commands. We made the four motion functions about

    the HAKUEN. These functions are shown below.

    (1) Face tracking function

    The HAKUEN moves its head in order to follow the operator's face motion. We call the motion is a "face

    tracking function". The operators face is detected based on the pixel values of the flesh tint in the color

    image.

    (2) Handshaking function

    The HAKUEN holds out its right hand toward the operator in order for the operator to shake robots hand.

    We call the motion is a "handshaking function".

    (3) Voice recognition function

    The HAKUEN moves according to the operator's voice commands. We call the motion is a "voice

    recognition function". We use the voice recognition software (via voice, IBM) which is controlled by the

    Active X program in order to recognize the voice commands.

    (4)Approach function

    We consider that the suitable distance range between the HAKUEN and the operator is 0.6[m]-1.2[m].

    The robot approaches the operator and keeps the suitable distance. We call the motion is an approach

  • 8/2/2019 Seminar Arun

    5/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 5 VJEC,CHEMPERI

    Our assistive robot system is shown in Figure 1. This system is composed of the manipulator, a PC , a

    microphone and a stereo vision hardware. The manipulator used here has six degrees of freedom of

    motion and has a mechanical hand. Since the system has to recognize the position and posture of the hand

    in real time, we use the stereo vision hardware. In our system the operator gives a hand gesture to the

    manipulator conversationally. For example, when the operator points with the forefinger to the object and

    gives the voice instruction to the manipulator in order to indicate the target object, the manipulator picks

    up the object and hands it over to the operator.

    .

    Fig 4.1the Hakuen robot

  • 8/2/2019 Seminar Arun

    6/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 6 VJEC,CHEMPERI

    CHAPTER 5

    FACE TRACKING FUCTION

    At first, the HAKUEN has to detect the human face in the color image. The human face is detected based

    on the pixel values of the flesh tint in the color image. The color image is digitized as 24 bit RGB (Red,

    Green and Blue) pixel value, so that each element of RGB is 8 bit or 256 levels of brightness6). However,

    the RGB value is apt to be influenced by the light. Therefore, we use the HLS (hue, saturation, and value)

    color specification system in order to detect the human face accurately. The each elements of HLS color

    specification system are described in (1)-(3) and calculated based on the RGB pixel value. In order to

    detect the human face in the color image, we transform color image to the binary image based on the

    threshold values of HSL color specification system. We define the threshold values of HSL color

    specification system about the flesh tint through the experiment

    The system operates in two stages: it first applies a set of neural network-based filters to an image, and

    then uses an arbitrator to combine the outputs. The filters examine each location in the image at several

    scales, looking for locations that might contain a face. The arbitrator then merges detections from

    individual filters and eliminates overlapping detections.

    The first component of our system is a filter that receives as input a 20x20 pixel region of the image, and

    generates an output ranging from 1 to -1, signifying the presence or absence of a face, respectively. To

    detect faces anywhere in the input, the filter is applied at every location in the image. To detect faces

    larger than the window size, the input image is repeatedly reduced in size (by subsampling), and the filter

    is applied at each size. This filter must have some invariance to position and scale. The amount of

    invariance determines the number of scales and positions at which it must be applied. For the work

    presented here, we apply the filter at every pixel position in the image, and scale the image down by a

    factor of 1.2 for each step in the pyramid. First, a preprocessing step, adapted from [21], is applied to a

    window of the image. The window is then passed through a neural network, which decides whether the

    window contains a face. The preprocessing first attempts to equalize the intensity values in across the

    window. We fit a function which varies linearly across the window to the intensity values in an ovalregion inside the window. Pixels outside the oval may represent the background, so those intensity values

    are ignored in computing the lighting variation across the face. The linear function will approximate the

    overall brightness of each part of the window, and can be subtracted from the window to compensate for a

    variety of lighting conditions. Then histogram equalization is performed, which non-linearly maps the

  • 8/2/2019 Seminar Arun

    7/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 7 VJEC,CHEMPERI

    intensity values to expand the range of intensities in the window. The histogram is computed for pixels

    inside an oval region in the window. This compensates for differences in camera input gains, as well as

    improving contrast in some cases. The preprocessing steps are shown in. The preprocessed window is then

    passed through a neural network. The network has retinal connections to its input layer; the receptive

    fields of hidden units . There are three types of hidden units: 4 which look at 10x10 pixel subregions, 16

    which look at 5x5 pixel subregions, and 6 which look at overlapping 20x5 pixel horizontal stripes of

    pixels. Each of these types was chosen to allow the hidden units to detect local features that might be

    important for Rowley, Baluja, and Kanade: Neural Network-Based Face Detection (PAMI, January 1998)

    3 face detection. In particular, the horizontal stripes allow the hidden units to detect such features as

    mouths or pairs of eyes, while the hidden units with square receptive fields might detect features such as

    individual eyes, the nose, or corners of the mouth. Although the figure shows a single hidden unit for each

    subregion of the input, these units can be replicated. For the experiments which are described later, we use

    networks with two and three sets of these hidden units. Similar input connection patterns are commonly

    used in speech and character recognition tasks [10, 24]. The network has a single, real-valued output,

    which indicates whether or not the window contains a face.

    saturation (constant) saturation (adjusted) saturation (constant) saturation (adjusted)

    (a) S=40--55 (b) S=70--255 (a) S=40255 (b) S=20--255

    Fig 4.1Detection of the flesh tint (case 1) Fig 4.2Detection of the flesh tint (case 2)

  • 8/2/2019 Seminar Arun

    8/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 8 VJEC,CHEMPERI

    5.2 STAGE TWO: MERGING OVERLAPPING DETECTIONS AND

    ARBITRATION

    The raw output from a single network will contain a number of false detections. In this section, we

    present two strategies to improve the reliability of the detector: merging overlapping detections from a

    single network and arbitrating among multiple networks5.2.1 Merging Overlapping Detections

    Note that in Fig. 3, most faces are detected at multiple nearby positions or scales, while false

    detections occur with less consistency. This observation leads to a heuristic which can eliminate many

    false detections. For each location and scale, the number of detections within a specified neighborhood of

    that location can be counted. If the number is above a threshold, then that location is classified as a face.

    The centroid of the nearby detections defines the location of the detection result, thereby collapsing

    multiple detections. In the experiments section, this heuristicwill be referred to as thresholding.

    If a particular location is correctly identified as a face, then all other detection locations which overlap it

    are likely to be errors, and can therefore be eliminated. Based on the above heuristic regarding nearby

    detections, we preserve the location with the higher number of detections within Rowley, Baluja, and

    Kanade: Neural Network-Based Face Detection (PAMI, January 1998) 5 a small neighborhood, and

    eliminate locations with fewer detections. In the discussion of the experiments, this heuristic is called

    overlap elimination. There are relatively few cases in which this heuristic fails; however, one such case

    is illustrated by the left two faces in Fig. 3B, where one face partially occludes another.The

    implementation of these two heuristics is illustrated in Fig. 6. Each detection at a particular location and

    scale is marked in an image pyramid, labelled the output pyramid. Then, eachlocation in the pyramid is

    replaced by the number of detections in a specified neighborhood of that location. This has the effect of

    spreading out the detections. Normally, the neighborhood extends an equal number of pixels in the

    dimensions of scale and position, detections are only spread out in position. A threshold is applied to these

    values, and the centroids (in both position and scale) of all above threshold regions are computed. All

    detections contributing to a centroid are collapsed down to a single point. Each centroid is then examined

    in order, starting from the ones which had the highest number of detections within the specifiedneighborhood. If any other centroid locations represent a face overlapping with the current centroid, they

    are removed from the output pyramid. All remaining centroid locations constitute the final detection

    result. In the face detection work described in [3], similar observations about the nature of the outputs

    were made, resulting in the development of heuristics similar to those described above.

  • 8/2/2019 Seminar Arun

    9/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 9 VJEC,CHEMPERI

    5.2.2 Arbitration among Multiple Networks

    To further reduce the number of false positives, we can apply multiple networks, and arbitrate

    between their outputs to produce the final decision. Each network is trained in a similar manner, but with

    random initial weights, random initial nonface images, and permutations of the order of presentation of the

    scenery images. As will be seen in the next section, the detection and false positive rates of the individual

    networks will be quite close. However, because of different training conditions and because of self-

    selection of negative training examples, the networks will have different biases and will make different

    errors.. Each detection at a particular position and scale is recorded in an image pyramid, as was done with

    the previous heuristics. One way to combine two such pyramids is by ANDing them. This strategy signals

    a detection only if both networks detect a face at precisely the same scale and position. Due to the

    different biases of the individual networks, they will rarely agree on a false detection of a face. This

    allows ANDing to eliminate most false detections. Unfortunately, this heuristic can decrease the detection

    rate because a face detected by only one network will be thrown out. However, we will see later that

    individual networks can all detect roughly the same set of faces, so that the number of faces lost

    due to ANDing is small.

    Similar heuristics, such as ORing the outputs of two networks, or voting among three networks,

    were also tried. Each of these arbitration methods can be applied before or after the thresholding

    and overlap elimination heuristics. If applied afterwards, we combine the centroid locations

    rather than actual detection locations, and require them to be within some neighborhood of oneanother rather than precisely aligned.

    Arbitration strategies such as ANDing, ORing, or voting seem intuitively reasonable, but perhaps

    there are some less obvious heuristics that could perform better. To test this hypothesis, we

    applied a separate neural network to arbitrate among multiple detection networks. For a location

    of interest, the arbitration network examines a small neighborhood surrounding that location in the

    Rowley, Baluja, and Kanade: Neural Network-Based Face Detection output pyramid of each individual

    network. For each pyramid, we count the number of detections in a 3x3 pixel region at each of three scales

    around the location of interest, resulting in three numbers for each detector, which are fed to the

    arbitration network, as shown in Fig. 8. The arbitration network is trained to produce a positive output for

    a given set of inputs only if that location contains a face, and to produce a negative output for locations

    without a face. HAKUEN looks down, the total saturation value in the color image increases further.

  • 8/2/2019 Seminar Arun

    10/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 10 VJEC,CHEMPERI

    Therefore the threshold values of saturation and value are adjusted automatically based on

    the total values of saturation and value in the color image. The threshold values of saturation

    The examples of detection of the flesh tint using image processing are shown in Figure2 and Figure3.

    The Figure2(a) is the case of the constant threshold value of the saturation when the HAKUEN looks

    down.

    Since the color of the floor is similar to the flesh tint, the floor area is detected as the flesh tints

    area. The flesh tints area is detected correctly (Figure2(b)), because the threshold values of the saturation

    and value are adjusted automatically based on the total values of saturation and value in the color image.

    After the detection of the flesh tints area, the human face is recognized in consideration of the

    maximum area and circularity about the flesh tints area. We determine that the threshold value of the

    circularity is 0.1.

    5.3 EXPERIMENT OF THE FACE TRACKING FUNCTION

    The face tracking function is that the HAKUEN moves its head so as to get the operator's face at the

    center of the image. The relative location of the operator and HAKUEN is shown in Figure6. The number

    of operators is five. Each operator gave the voice command thirty times. The system could detect the face

    at the all cases. The average time of the face detection was 18.16[sec].

    Figure6 Experiment of the face tracking function

    Fig 5.1 Experiment on face tracking fuction

  • 8/2/2019 Seminar Arun

    11/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 11 VJEC,CHEMPERI

    CHAPTER 6

    DISTANCE DETECTION USING STEREO IMAGE

    Since HAKUEN has two cameras, we use the disparity of stereo image in order to detect the

    distance between the HAKUEN and the operator. As shown in Figure4, the disparity is a difference

    between the target object's position of the right image and that of the left image. When the positions of

    two cameras are fixed, the disparity is changed according to the distance L between the object and the

    camera. We can obtain the distance L between the robot and the operator by means of the disparity .

    Total Saturation Threshold Total Value Threshold

    Fig 6.1 shows Disparity between two images

    . Difference of disparities based on the distance

    As shown in Figure, the disparity decreases with the distance between camera and the object. The

    disparity is the difference of pixels values of the human faces center position in the right image and the

    left image. The relation between the disparity and the distance to the object is obtained through the

    experiment

  • 8/2/2019 Seminar Arun

    12/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 12 VJEC,CHEMPERI

    6.1 EXPERIMENTS OF THE APPROACH FUNCTIONThe approach function is that the HAKUEN approaches the operator and keeps the suitable

    distance (0.6[m]--1.2[m]). We define five cases of the distance (1.5[m],2.0[m],2.5[m],3.0[m],3.5[m])

    between the HAKUEN and the operator. Each case of the experiment was done thirty times. The average

    of the successrate was 88.33%. The example of the approach function is shown in Figure7.

    a)initial function b)approach function

    Fig6.2 Experiment on approach function

  • 8/2/2019 Seminar Arun

    13/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 13 VJEC,CHEMPERI

    CHAPTER 7

    ROBOT HAND SHACKING ACTION

    7.1 DETECTION OF THE HAND

    At first, the system has to detect the hand area in the image of the work space. The hand area is

    detected based on the RGB pixel values of the flesh tint in the colour image. The RGB value is apt to be

    influenced by the light. Therefore, we use the hue of flesh tint in order to reduce the influence of the light

    The area of the flesh tint is detected roughly in the colour image using the hue value and the noise

    isremoved using the RGB value

    Fig 7.1 robot system

    After the hand area is detected using a RGB value and the hue value of the colour image, we determine the

    center position of the hand that is called the CP in order to trace the hand. Since the size of the fist of the

    human is approximately equal to the sphere with a radius of 40mm, the system searches for the center of

    the sphere with the maximum density of pixels of flesh tint. The center of the sphere is regarded as the CP

    of the hand. Once the CP is detected, the hand is traced by the tracking of the CP.

    Assistive Robot System Using Gesture and Voice Instructions 67

  • 8/2/2019 Seminar Arun

    14/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 14 VJEC,CHEMPERI

    6.2 RECOGNITION OF THE HAND GESTURE

    As shown in Figure 2, we define several instructions using hand configurations.

    We make the manipulator move in accordance with the instructions of handgestures. For example, when the operator opens the hand upwards (Inst.2), the

    manipulator

    Inst.1 Grasp Inst.2 Deliver the object Inst.3 Approach Inst.4 Stand by

    Fig 7.2.Instructions of hand gestures

    We define three characteristic dimensions (A,B and C) of the hand in order to recognize the hand gesture

    rapidly. As shown in Figure 3, hand gestures are divided into branches based on the conditions. The length

    A is the distance from the CP to the tip of the forefinger. The length B is the maximum width of the hand

    block. The length C is the maximum width of the finger block. For example if the length A is less than 60

    mm, we consider that the operator closes the hand and the hand gesture means the instruction 1. If the

    length A is more than 60 mm, we calculate the length B. Because we don't use the whole hand

    configuration but the three characteristic dimensions, the hand gesture is determined rapidly.

  • 8/2/2019 Seminar Arun

    15/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 15 VJEC,CHEMPERI

    Fig7.3General flow about the recognition of the hand gesture

    7.3 EXPERIMENTS OF THE HAND SHAKING FUNCTION

    The hand shaking function is that the HAKUEN holds out its right hand toward the operator when

    the HAKUEN keeps the suitable distance (0.6[m]--1.2[m]) from the operator. We define four cases of thedistance (0.4[m],0.8[m],1.0[m],1.4[m]) between the HAKUEN and the operator. We define three cases of

    the angle (-20,0,20) between the HAKUEN and the operator. Each case of the experiment was done

    thirty times. The percentage of success is shown in Table . The average of the success rate was 97.63%.

    Fig 7.4 experiment on hand sacking fuction

  • 8/2/2019 Seminar Arun

    16/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 16 VJEC,CHEMPERI

    CHAPTER 8

    VOICE INSTRUCTIONS

    The voice command is compo Speech Recognition technology promises to change the way we interact

    with machines(robots, computers etc.) in the future. This technology is getting matured day by day

    and scientists are still working hard to overcome the remaining limitation. Now a days it is introducing

    many important areas (like - in the field of Aerospace where the training and operational demands on the

    crew have significantly increased with the proliferation of technology [27], in the Operation Theater as a

    surgeons aid to control lights, cameras, pumps and equipment by simple voice commands [1]) in the

    social context. Speech recognition is the process of converting an acoustic signal, captured by micro-

    phone or a telephone, to a set of words [8]. There two important part of in Speech Recognition - i)

    Recognize the series of sound and ii) Identified the word from the sound. This recognition techniquedepends also on many parameters - Speaking Mode, Speaking Style, Speaker Enrollment, Size of the

    Vocabulary, Language Model, Perplexity,Transducer etc [8]. There are two types of Speak Mode for

    speech recognition system- one word at a time (isolated-word speech) and continuous speech. Depending

    on the speaker enrolment, the speech recognition system can also divide - Speaker dependent and Speaker

    independent system. In Speaker dependent systems user need to be train the systems before using them, on

    the other hand Speaker independent system can identify any speakers speech.Vocabulary size and the

    language model also important

    The system does not determine the position of the target object based on the image processing, when the

    many objects lie on the table. The system recognizes the configuration and colour of the target object from

    the voice instruction. For example, the operator gives the voice instruction Take the red ball to the

    manipulator, the position of the red ball is determined in the work space. We use the voice recognition

    software (via voice, IBM) in order to recognize the voice The operator gives the voice commands to the

    system in order to move the HAKUEN.

    Language model or artificial grammars are used to confine word combination in a series of word or

    sound. The size of the vocabulary also should be in a suitable number. Large numbers of vocsed of the

    simple word. We define six voice commands which are shown in Table .

    For example, when the operator gives the voice command "a ku shu", the HAKUEN approaches the

    operator and holds out its right hand to shake hand with operator. We use the voice recognition software

    (Via Voice, IBM) in order to recognize the voice commands.

  • 8/2/2019 Seminar Arun

    17/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 17 VJEC,CHEMPERI

    Table 8.1 voice commands

    7.1 EXPERIMENTS OF THE VOICE RECOGNITION FUNCTION

    The voice recognition function is that the HAKUEN moves according to the voice command. We

    define six voice commands. Each voice command was given forty times in the experiment. The number ofoperators is five. . The average recognition rate of the voice commands was 92.8%.

  • 8/2/2019 Seminar Arun

    18/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 18 VJEC,CHEMPERI

    CHAPTER 9TOTAL EXPERIMENT OF THE SYSTEM

    The total experiment in order to clarify the effectiveness of our system. At first operator gives

    the voice command i do o to the HAKUEN. Then the HAKUEN approaches the operator and stops the

    suitable position in front of the operator. Next, the operator gives the voice command a ku shu to the

    HAKUEN. The HAKUEN holds out its right hand toward the operator. We define four cases of the

    distance(1.5[m],2.0[m],2.5[m],3.0[m]) between the HAKUEN and the operator. Each case of the

    experiment was done thirty times. The average of the success rate was 81.67%.

    Fig 9.1 total experiment

  • 8/2/2019 Seminar Arun

    19/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 19 VJEC,CHEMPERI

    CONCLUTION

    Human-Robot interaction is an important, attractive and challenging area in HRI. The Service

    Robot popularity gives the researcher more interest to work with user interface for robots to make it more

    user friendly to the social context. Speech Recognition (SR) technology gives the researcher the

    opportunity to add Natural language (NL) communication with robot in natural and even way.

    In this paper, we developed the communication robot HAKUEN based on image processing

    and voice recognition. This system has four motion functions (face tracking function, shaking hand

    function, voice recognition function and Approach function). The average ofthe success rate about

    the total experiment was 81.67%. In future work, we have to define many kinds of functions for the

    practical application of our system

  • 8/2/2019 Seminar Arun

    20/20

    SEMINAR REPORT 2012 COMMUNICATION ROBOT SYSTEM BASED ON THE

    HANDSHAKING ACTION

    DEPARTMENT OF E&I Page 20 VJEC,CHEMPERI

    REFERENCES[1] N. Yamasaki and Y. Anzai, "Active Interface for Human-Robot Interaction" : Proc. of the IEEE Int

    Conf. on

    Robotics and Automation, pp.3103-3109, 1995.

    [2] N. Kawarazaki, N. Kashiwagi, I. Hoya and K. Nishihara, Manipulator Work System Using Gesture

    Instructions,Journal of Robotics and Mechatronics, Vol.14 No.5, pp.506-513.

    [3 ] N. Kawarazaki, Y. SUZUKI, Y. TAKASHIMA, K. Nishihara and T. Yoshidome: Robot Control

    System

    Using Omnidirectional Image, Proc.of Japan-China Conference on Mechatoronics 2005, pp.97-98.

    [4] N. Kawarazaki, K. KAWASHIMA, T. YOSHIDOME and K. NISHIHARA: Communication

    Robot System based on stereo vision and voice instructions, Proc.of China-Japan Conference on

    Mechatoronics 2007, pp.23-25.

    [5] A. Agah and K. Tanie, Human Interaction with a Service Robot : Mobile-Manipulator Handing Over

    an

    Object to a Human, Proc. of the IEEE Int. Conf. on Robotics and Automation, pp.575-580.

    [6] John C. Russ : The Image ProcessingHandbook, A CRC Handbook Published in Cooperation with

    IEEE press,1999