0:00:15one of communication which is just as important as much
0:00:21namely nonverbal communication
0:00:25and in my i will discuss
0:00:28a how to enrich a
0:00:31the precise and useful function of computers with the human stability
0:00:37to show i mean of the message nonverbal behaviors
0:00:42also here in the collaboration between the woman and a robot to see they are
0:00:48not just collaborating this even the kind of close effective upon between down
0:00:56unease is actually of the for both of my of the research
0:01:01so that the problem i will be a structure is always select will first talk
0:01:07about
0:01:08at the recognition of social issues in human robot interaction but of course the technology
0:01:15is also useful for any kind
0:01:18of a solution to see there are also source signals in human
0:01:23interaction
0:01:25or in a man virtual agent interaction
0:01:29then average hold out that the generation of soldiers you in human robot interaction
0:01:36of course dropped what should not be just able to interpret the human signal
0:01:42it should also be able to respond with appropriately
0:01:47the next topic will be a dialogue management
0:01:51in a still virtual human robot in that should be able to talk about what
0:01:56the talent hiking
0:01:58also a pilot a solution is a mutual gaze back channels
0:02:04and to handle all these challenges we need of course a lot of data
0:02:10and so the last a part of my whole will be on how gradient learning
0:02:16for focus
0:02:18and at least is
0:02:19which will ease at fort of human by using
0:02:25scenarios to wise or
0:02:28so let's start with the recognition of a social use and human robot interaction
0:02:36so what kind of i c nodes are interested in
0:02:40basically in speech and facial expressions guys holster gestures body movements
0:02:47and approximate
0:02:49about not only the in are not only interested in the solution can use of
0:02:54an individual person
0:02:56but also in interaction patterns such as synchrony on maybe we interpersonal added you for
0:03:06example with the don't mean and a person
0:03:09all agent in interaction
0:03:11and i was also engagement
0:03:15so how engaged are the participants in than in a church
0:03:20so if you look at the literature the most attention has high to facial features
0:03:29so i don't want to go in detail here just mentioned
0:03:35and spatial i should according to the system which is used applied of to recognize
0:03:42but also channel eight facial expressions
0:03:45and the basic idea is to define such units
0:03:50i to characterize sosa emotional expressions
0:03:54others such as a hundred raised out of which is usually an indicator of the
0:04:00happiness
0:04:02also a lot of what has been spent on
0:04:06cool emotion recognition
0:04:09you're just for inspiration i show you how to signal of the same utterance baseball
0:04:16in different emotions
0:04:19you can see here the pitch point where it is quite a different
0:04:24depending on the emotion expressed
0:04:27and there has been some effort to find a wouldn't predictors of for vocal
0:04:34a motion it i would like to mention geneva minimalistic a set of features
0:04:42which was recently introduced and which actually titanium that's why would waste is also if
0:04:49you compare the two feature set
0:04:52consisting of a semblance analysis of features so if you like some will try to
0:05:00get of speech is a binary or deep neural network approaches us so it would
0:05:07be
0:05:08it put it here to compare arguably side it's a with the police is obtained
0:05:13and it by the chili that minimalistic a feature set
0:05:17so if you look at the literature you might get the impression okay you get
0:05:22very high recognition why the four emotions
0:05:26it even a little bit a scary a few wiped it to model and a
0:05:32test run in real words and mapping of the find out okay
0:05:37we started as sometimes even comes
0:05:41close up to four
0:05:43randomized the
0:05:46we sites
0:05:47so why is that so actually a previous research has focused on the analysis of
0:05:55equipments the basic emotions the motions
0:05:58that are quiet or extreme prototypical
0:06:03emoticons such as happiness that knows this task anger
0:06:08i emotional responses of what what's can usually not be mapped to a men's basic
0:06:16a motion so we see here for example use that's and because of the point
0:06:24i know
0:06:25that a post edit any woman and to create web why the happy in the
0:06:31interaction with the robot but it's not clearly
0:06:35with
0:06:36a couple of years ago
0:06:38a colleague of mine and one about clean a heated that we are interested in
0:06:44this study
0:06:46so actually they invest to investigate the motion recognition rate for acted emotions
0:06:53for read a motion and motions type in the with that of course the sound
0:06:59natural and it actually cost was just to distinguish between what they the motion no
0:07:06unknown motion so not the very difficult task
0:07:10and what i don't a motion is
0:07:12they got one hundred percent so help
0:07:16for an emotion
0:07:19is a little bit more natural than acted emotions they got eight percent
0:07:25which is okay but not really exciting because you know chances fifty percent if we
0:07:31just need to recognise what distinguish between
0:07:35mutual the motion and
0:07:38abortions
0:07:40and finally for obvious that of course scenario they just got a seventy percent
0:07:47so obviously systems developed under laboratory conditions of how perform poorly unless ordered
0:07:56a scenarios
0:07:58and the challenge is actually adaptive real time applications
0:08:05so usually if you look at the clutch if you look at speakers people obtain
0:08:11you will find out that most studies a offline start this so they take the
0:08:17call was
0:08:18and the calls
0:08:21is usually a pair
0:08:23and for example expressions that cannot be locally and in that one was the city
0:08:29emotional states
0:08:31a simple thing that our
0:08:33and the also a
0:08:36yup and of course the start from the assumption that whole process is a segment
0:08:42that in some way
0:08:44but the in so we don't life we also have a one handed noise the
0:08:50on the other data
0:08:53so we might as seen you information
0:08:56and also our pictures can only rely on previously seen data so we cannot
0:09:01look into the future
0:09:03but of course that the system has to at least one
0:09:06in a real time
0:09:09so the question is what can you about what they're
0:09:13and one other thing though we might consider would be or
0:09:18the context
0:09:19so if you know at the picture why matching in which emotional state
0:09:27a couple s
0:09:29so we have any idea of pos just people who don't know
0:09:35to compute a context
0:09:37any ideas what emotional state
0:09:40to go would be
0:09:44your quite what so usually actually in other people's say okay anyway distress
0:09:53its candidate
0:09:55i you are actually very good of size three that
0:10:00because it's a actually a jealousy
0:10:04i do actually the first cousins who actually of how it immediately a correct motion
0:10:09i nevertheless i don't say a four system
0:10:14and even able to a type of the facial action you want it in a
0:10:18perfect manner would have problems to find how without knowing the context
0:10:25that the least actually other channels
0:10:28so there are some
0:10:31recent research has been done actually to consider of the context and we science electro
0:10:38some improvement
0:10:40so a couple of years ago we investigate the agenda specific motion in the motion
0:10:48like recognition
0:10:49and so we were able to improve the recognition rates by training gender-specific a model
0:10:57and that's an approach was a done by christina format so actually she can see
0:11:05that the success and failure you don't
0:11:08it would during an application for example it student is heading a little time
0:11:13and that's to smiling a then interacting with the way application
0:11:19so probably the student is not a really happy to might be a to student
0:11:25does not try to system
0:11:27serious
0:11:28and i even though this approach is quite used for quite reasonable it has not
0:11:35be in a pick up so much
0:11:38so
0:11:39we arg one see that you got the dialogue behind me out of the virtual
0:11:43agent in the job and do you training scenario
0:11:47so for example when a job interview a task difficult questions about a the weaknesses
0:11:53of the candy that
0:11:55then it is also i had to something the pilot a
0:11:59a likely the motion that state
0:12:02and they are some of the time
0:12:05the to align actually a temp where context using bidirectional long short-term and you were
0:12:14networks
0:12:15so the context
0:12:16a might be a good option to oakland see that
0:12:22and not a maybe obvious thing to one see that use a multi modality here
0:12:28you can see you know what has bought cell where no it's just one
0:12:32and it to one of four two with a look at nearly to say so
0:12:37actually or it's
0:12:38for me it's not possible to recognize any difference in the face
0:12:43but if you look at the bottom
0:12:46you'll get a match the other pictures so on the right
0:12:50this moment is obviously why that's right i guess correct way no but not you
0:12:55very happy about l
0:12:57but nonetheless at least two from
0:13:00home or a demonstrated by a the face
0:13:05so
0:13:07multimodal fusion the data how
0:13:12that is an interesting to start by a team at all and a whole rate
0:13:16on my remote affect the detection
0:13:20and a study us to investigate that many studies that have outperformed the possibly with
0:13:28them at a study
0:13:29and radio show what
0:13:31that improvement how correlates with the naturalness of the calls which is actually that you
0:13:38so as a step four of them
0:13:41acted emotions
0:13:43you get quite high recognition rate and if you use multiple modalities
0:13:48so you can even get improvement of more than ten percent
0:13:52but for to difficult task namely spontaneous emotions
0:13:56the improvement left and i was then which is really bad you because the
0:14:02should we a hundred
0:14:05the user to additional devices just get less than five percent recognition rates
0:14:11and this assumption actually is that in the natural interaction
0:14:17a sheep are actually a of shall a motion in a you once is a
0:14:24menace or may not show a motion
0:14:28so more channels are the same express if a
0:14:32manner
0:14:33and first investigate a tractable
0:14:38assumption of we all looked at the call so we have we had a corpus
0:14:45i would hate affect just by the video and then just find audio
0:14:50and then you don't with that note i should mismatch on or
0:14:55and then we don't at the recognition rate and actually or when the annotations a
0:15:02mismatch
0:15:04and so the robot a match the low well
0:15:08like recognition weights
0:15:10so it will show you another example look at a woman the here
0:15:16so we have let's look at the second rate and here the woman shows a
0:15:21neutral face
0:15:23and the voice is happy
0:15:26and a little bit late error rates the other way round one is of the
0:15:30face looks at it but it's new well
0:15:34and i was sort of question is a watch a whole fusion approach to in
0:15:41such a situation
0:15:43and a yellow i sketch a potential solution
0:15:48so we my a so you show actually modality a specific recognizer might decide when
0:15:57to quantum leap you would
0:15:58and then interpolated
0:16:01and the y-axis interpolation or we get a better recognition besides
0:16:08so if you look at the literature so most of fusion approaches actually used in
0:16:13one this fusion approaches
0:16:16and synchronous fusion approaches are carried wise a it could situation of multiple modalities
0:16:23within the same time frame so for example people at a complete seven and eight
0:16:31just analyze the face
0:16:33and avoid over complete
0:16:36sentence
0:16:38i think owners fusion
0:16:41approaches
0:16:42actually a
0:16:44they a color rate and a modality is not bad all at different times
0:16:52so they do not assume that for example audio and video
0:16:57a expressed
0:16:59at the same time
0:17:01and therefore they are able to track channel to a simple nature of cops the
0:17:06other modalities so it's very important if you use the fusion approach and like to
0:17:13use of approach that is thus able
0:17:17two point see that and what a dependency is
0:17:21and it depends what if we wish of modalities but also
0:17:26the interdependence between modalities
0:17:29and that is only possible
0:17:31if you go for frame-wise recognition approach
0:17:36so we don't this approach either but a first year
0:17:39so we adopt at an event bayes fusion approach where we once you to events
0:17:46as an additional
0:17:48layout of at stretch between or sink nodes
0:17:51and higher-level emotional states
0:17:54even though the such as are allowed to have no
0:17:59or similar kinds of the social few
0:18:04and a in this way we were able to try to work how the temporal
0:18:10relationships between channel
0:18:12and learn when to provide information
0:18:15and also in case of some data on be seeing
0:18:20another approach is still a delivers a reasonable recognition besides
0:18:26so let's have a look at an exam well it's a simplified example it's over
0:18:31here we have audio and we have a facial expressions
0:18:36and the fusion approach my comma
0:18:41ways
0:18:43so what degree of whether it's
0:18:45and now let's assume for some reason the audio is no longer available
0:18:50and why interpolation
0:18:52we still a get a wide reasonable
0:18:56with is
0:18:57so we compare
0:19:01and number of those seen owners fusion approaches i think there is a fusion approaches
0:19:07and he went written of fusion
0:19:10and so for example of forty a synchronous fusion approaches so we call
0:19:16consider for example you wouldn't networks we also once it's not understand to
0:19:24take into account the temporal history of signals
0:19:29and also a bidirectional long on a short time you will networks
0:19:34a to be able to look in the future
0:19:38and to learn to tamper what history and what you can see here which is
0:19:43quite a whitening
0:19:46that or i think colin is a fusion the
0:19:49approaches actually up outperform a that are
0:19:53then one is a fusion approaches
0:19:56so a message i call it is if you fuses modalities
0:20:02usually do for approach that its first a able to point see that
0:20:07the can we wish of modality is
0:20:10but also in the dependency between modalities
0:20:15actually i mean actually i am i
0:20:35i don't i right away
0:20:48like a rational
0:20:54and actually two
0:20:58a postech development of
0:21:02social see their processing approaches for on-line recognition task
0:21:08we developed a framework which is called justice i for social signal
0:21:12in the quantization
0:21:14and this framework a synchronized with the modalities and it supports equal clear
0:21:21machine learning i nine words or offering a various kinds of machine learning
0:21:27approaches
0:21:28and
0:21:29we are able to actually or
0:21:34you with the natural at all modalities and sentences and whenever stands and uses and
0:21:41it becomes available
0:21:42my people write read will for it
0:21:45so we consider a motion capturing as you are the ones you doing of various
0:21:51kinds of
0:21:52i try to a stationary i like a smoothed by
0:21:56i traded
0:21:58and
0:21:59also a text is
0:22:02so basically all kinds of
0:22:05sensors that our company
0:22:07but way level
0:22:09so this was the top one or
0:22:12emotion recognition now i would like to come up to the as a side namely
0:22:18to the generation of those used by the robot
0:22:22it's nice that it is not sufficient to recognize the motion
0:22:25you also need to respond appropriately approaches a list apart appropriate responses
0:22:33and
0:22:36i guess it's a clear so why would nonverbal human signals a where we all
0:22:43and update not only express emotions but also edit you would
0:22:48intention
0:22:49also called only high interpersonal relations with the plate sample
0:22:55you are interested in talking to have a
0:22:59or not
0:23:00and nonverbal the three minutes kind of course also be you with
0:23:05other to understand be worth messages
0:23:10and in general will make the communication
0:23:12more natural implausible
0:23:15so we see that there are a couple of years ago a with and how
0:23:18well what
0:23:19of course the not what a leader is not how well
0:23:23and expressive case fetters so we have to look for after options and so we
0:23:29looked for action
0:23:31a tendency is
0:23:32which are related to motion selection and this is actually want to show before you
0:23:39start at so it's very common in
0:23:42in sports
0:23:44so you have proposed chat bots a person
0:23:48and to sports is not yet it but it's quite clear what is coming next
0:23:56and so we among a cisco we simulated actually tendencies such as approach
0:24:03panic attack and submission
0:24:05and it turned out that people were able to
0:24:08wait and is a ds the action can see is
0:24:13later we actually
0:24:16got a robot from hand mobile kind
0:24:19and here we actually try to simulate of facial
0:24:24expressions
0:24:26and you well kind of image that is all three start from the facial action
0:24:32coding system i mentioned
0:24:35well
0:24:36and a actually identify forty actually you would minutes of forty human of high
0:24:45for the question or can we simulate a report the action units
0:24:51and the for the robot
0:24:53so we write about the and a this the simulation of just seven hatch you
0:24:59wouldn't
0:25:00and these robot has a syntactic a skin and on the skin your house on
0:25:06modal is and the motors can move a and a beep or form eight
0:25:11a to form a the skin
0:25:13do we not only a little a two hour
0:25:16simulate the seven action units and at a question is whether this is enough and
0:25:21i show you show a video
0:25:24so it pretty what is in german with english as a high that's a lot
0:25:28is introduced focus about non-verbal signals it does not necessarily that you want to understand
0:25:35what is you start
0:25:38you can just a discussion of actually what the machine about information the machine have
0:25:44to be close she did not consider at stage the semantics of or utterances
0:25:50to about position is equal
0:25:55it can you see also would not test so it is equal to one can
0:26:04once will be given by its because i
0:26:10i
0:26:12yes i understand what is not one but also talk about
0:26:20but also useful what it is not quite often what it
0:26:25i don't think it's one is that all data that are not handled by a
0:26:31weighted sum of all
0:26:36is that it is not able to account for instance the hopefully it does come
0:26:42zero point all possible
0:26:45a problem with this is no i o
0:26:55in to compute so that you mentioned you can
0:26:59one for training
0:27:04okay just to show you that really does not can see that the semantics another
0:27:08example
0:27:12that's my
0:27:15schuller
0:27:20about done
0:27:22are you
0:27:25the system can do not work about online to one hundred fifty yet but not
0:27:38really constant talk detector e
0:27:44so just to show that you can't
0:27:47i have a conversation with emotional features are that's of course not over
0:27:53and a few well maybe we
0:27:57the of course a use a different from and to see so maybe we
0:28:05my a held at a it's not a over
0:28:09so what is the embassy still embassy it is an emotional response and its stance
0:28:16from the comprehension of emotional state of and also
0:28:22pairs
0:28:24and a so that the emotional state of the other person
0:28:29might be similar to your own emotions at but that's not have to be design
0:28:36a motion
0:28:37and embassy like what is either deeper such a of emotional state of an a
0:28:43set of parents and facilities is what we can more of a signal processing technology
0:28:50and it is also like well i guess so we don't think about the situation
0:28:55of the also use somehow
0:28:59need to know
0:29:00and of what at the outset person is feeling and why not start to oppose
0:29:05that it
0:29:07and also you are required to decide the how to respond to the ad suppose
0:29:14a motion
0:29:16so for example in the tutoring system
0:29:19if
0:29:20the student is in the very emotional state and depressed
0:29:24in a high it could be a disaster if the virtual agent would actually minimal
0:29:30a emotional state
0:29:32of the student because it might make a student
0:29:36moura
0:29:37depressed
0:29:38so
0:29:40it is actually a week what is a tree or
0:29:45this is a potential and want to not to show
0:29:49and we can realize kind of have say listen now
0:29:55so where we can see a motion we try to understand a emotional state
0:30:02and understanding and the motion state of the knots that appears in
0:30:08we could choose an internal reaction and that the question is should be external is
0:30:15a reaction and of what are two ways that i virtual you'll another
0:30:20examples was actually and how much will be
0:30:25simulated and appraisal a model
0:30:29a lot of the dialog alive will show you is actually is that of course
0:30:34so first and of what we do in this kind of a tie and all
0:30:39so we be able and motions
0:30:41a lot so
0:30:43we also a common to on the user's the emotions so the story will be
0:30:50a pilot a forgotten
0:30:52four point of medication
0:30:54and
0:30:56function and to see it is so we had to robert shows console a power
0:31:01of a button medication to increase awareness but it is doing it in a supplement
0:31:08no
0:31:09actually not what we are still at
0:31:13to a much
0:31:15and no overt so dropped what will show the some intention as
0:31:20while
0:31:21the palm down to the user
0:31:23so i will apply deal with
0:31:26but the video and what is actually a kind of amazing
0:31:32this is that it is disappointing fine edge while it is all
0:31:36here
0:32:07okay
0:32:12i
0:32:26a
0:32:39okay and a actually a to develop a better understanding of four emotions of users
0:32:47we are currently investigating how to combine the social signal processing of with affective as
0:32:54you rate of mind and cases actually what operation where is that happily an apart
0:33:01from the if i
0:33:03in a support
0:33:05so partly other developed a model of the whole and i don't know
0:33:10actually to simulate emotional behaviors
0:33:14and the basic idea is actually
0:33:17what
0:33:19have some and motion of stimulation and then change a ways of what do you
0:33:24recognise in terms of sources used
0:33:27actually matches and how well
0:33:29a simulation
0:33:31and the even type just a little bit of errors are
0:33:36we do not just once you to how a list one was so that
0:33:41and emotional state
0:33:43we also points you know how people
0:33:46actually show like to like they'll motions to show you an example
0:33:52so let's see that
0:33:55shape so if you are not regulated well you want a motion is either so
0:34:00the person who
0:34:03just flash they had a dollar
0:34:07and that this is the typical
0:34:10emotional expression
0:34:12we would expect
0:34:14and a people usually awake you like a motion is actually i like to better
0:34:20whole always the emotional state
0:34:24and or shy of the at different weights to like motions
0:34:32so avoided is one reaction but you put it text yourself so we have for
0:34:37example you say okay and i four and a but also at a gas a
0:34:43person
0:34:44and
0:34:46what you panacea actually other that we have a quite a different is no actually
0:34:53you know people might show depending on the way they regulate their motion and if
0:34:59you use a typical the machine learning approach actually
0:35:04to analyze distortion no
0:35:07you would never know i'm be able to find one motions
0:35:11because don't know
0:35:13how do people go back to rely on the emotional state so here is and
0:35:19have a price we have to discussion already yesterday
0:35:23maybe you can us
0:35:27machine learning approaches as like boxers recognise certain signals
0:35:33a fine tuning as some understanding actually
0:35:37a map
0:35:39to see that want to emotional states
0:35:41and it's even more important
0:35:44if the system has to respond what emotional state so matching a
0:35:49you talked to somebody on the on the guys not really understanding what's your problem
0:35:53you
0:35:54and i just at behaving like what we can you like well and
0:36:02a responding in a schematic a manner we were able shall
0:36:09and behaviour
0:36:10so it would like at the end of are also called me but all what
0:36:15is the weighted dialogue between a
0:36:18humans and or
0:36:20robots
0:36:21and only actually a client by dpi apply a job which can decide no
0:36:27on engagement and human robot in the action
0:36:32we looked at so
0:36:35signs of engagement in human robot a dialogue act of the amount of mutual gaze
0:36:41below a direct gaze turn taking
0:36:45and i just show you example the here it's a path of gain between a
0:36:52robot and you can result
0:36:54and to use that is where we hyped weight loss there's a so that the
0:36:59robot notes when it was is a loopy
0:37:03and in this specific as scenario
0:37:06all you know simulated directed gaze which is that kind of
0:37:12functional same
0:37:14so
0:37:15the robot is able to detect which all check
0:37:19the use that is a focusing on and this makes the interaction more efficient because
0:37:25there is no longer forced to describe
0:37:28o j lo detector i also implemented a hallway a scenario is or should gaze
0:37:36for distortion case actually voice
0:37:38do not have we deal function
0:37:41so i'd the dialogue was completely understandable without distortion we just wanted to know
0:37:48that's my to any difference
0:37:51so it just a very quickly
0:37:55we have a direct that a gaze assorted one who is the following two options
0:38:02and pointing the object or just looking at the object
0:38:06and for mutual gaze of both in that interval establish eye gaze
0:38:12the next thing what we realise was case is a disambiguation
0:38:18and a case applies disambiguation is interesting in so yes other people
0:38:25a few option which was then look away again
0:38:30so we need a different disambiguation approach
0:38:34that for example powerpointy then for example for pointing gestures when two point usually just
0:38:40point one and that's it you know what into the one time
0:38:45and so case is
0:38:47then we
0:38:49different
0:38:50and we also a real is
0:38:53so that some typical gaze behaviour is that you in a turn taking
0:38:58so speakers a new way usually from the addressee to indicate
0:39:05that they are for it to process of thinking about what to say next
0:39:11and also to show that they don't one and it should be a drop that
0:39:15and are typically at the end of an utterance the speakers
0:39:20low would you have a person
0:39:23because they want to know how we are suppose
0:39:26what the as opposed
0:39:28thinking about what has been set
0:39:31so basically
0:39:33we realize a shared folder of what follows the user's hand movements and drop what
0:39:40follows to users he's
0:39:42we will i social around eight
0:39:45so here to what i see and recognise this mutual gaze
0:39:49and finally to an eye dropper to make a nice is going to use that
0:39:54you tell
0:39:55and that will show you
0:39:58we deal
0:40:06so i decided to leave at the top and because i realise the top is
0:40:11much better roundy then the problem i did it is one of the
0:40:20how do okay
0:41:06the red wine it's of course ambiguous nothing more i k
0:41:12which man
0:42:19e
0:42:25thus
0:42:46again you know that
0:42:53and the we did an evaluation well this where
0:42:58and what we found was that actually of the object wanting was more effective than
0:43:04distortion grounding
0:43:07so the people were there are able to interact more efficiently with object a groundings
0:43:12of the dialogs were much shorter
0:43:15and the word lattice misconceptions
0:43:18and it's not distortion rounding error you not a improve the perception
0:43:23of the interaction
0:43:25which is of course appear because we spend quite some time one mutual gaze
0:43:32i one assumption is that people wear out that once waiting on the task instead
0:43:38of the social interaction with the robot
0:43:40and we might investigate if you have a more sources ask for example looking at
0:43:46family for both
0:43:47and the distortion gaze a might become more important
0:43:52and its assumption is a which we do not yet a try
0:43:57that some people are focusing more on the task in some without focusing more on
0:44:01the social interactions you can be classified like these
0:44:06and a specific people
0:44:08might appreciate the social gaze a more
0:44:12the analysis
0:44:14so have finally i would like a to come to reason a development is so
0:44:20we started one
0:44:23interactions in or dialogue
0:44:27and data from both sides of always
0:44:30do you make an interactive machine but also to machines in route a robot
0:44:36come do we she can interact
0:44:38the human
0:44:39so the o project which was already mentioned yesterday
0:44:45we have collected a corpus of which people a dialogue between
0:44:51you minutes
0:44:52and the dialogue has the in i'm not trying to label
0:44:58and we actually or integrated active learning and hope wait a litany
0:45:05in the annotation work so basically i think it is that the system actually
0:45:11this is which samples of the show you label
0:45:16pick the right relatedness and it also this is which sound shall be no actually
0:45:26a from like that at all
0:45:29and so one of which is forced to select examples
0:45:34for which a did not she and actually
0:45:38tie a low confidence
0:45:40and always that approach so we've well at the o to o
0:45:47make up the annotation process
0:45:51significantly more efficient
0:45:53and of these basically integration of the no one system is as i a system
0:45:59which i mentioned earlier
0:46:02and for the interactions that it is actually that you do an additive high main
0:46:10which is the essence of interruptions
0:46:13from
0:46:15called as a between a human
0:46:20it down
0:46:21so i to come to one compare emotion
0:46:25i think that a human robot in that capture cannot come we can treat here
0:46:32until a
0:46:34the problem of
0:46:35appropriate social interaction between robots and human
0:46:40for it
0:46:40in particular
0:46:42if a what is employed in
0:46:47the people it's how you
0:46:49and of what we need of course is a fully integrated into consisting of perception
0:46:54reasoning
0:46:55learning and responding
0:46:58and a particular it is at the moment is a big gap between the perceptual
0:47:04and the reason nine so the reasoning is
0:47:08kind of the net like that
0:47:10at the moment in favour of a black box the
0:47:14approaches
0:47:15which is useful for
0:47:18actually attended i o
0:47:20so we should use as such as laughter
0:47:24but after that so we need to reason about what
0:47:28actually distortion signal a marine
0:47:32and of course i know my disciplinary expertise is a
0:47:37necessary in order to emulate aspects of social intelligence that's why
0:47:41we call up with a lot we so
0:47:44psychologist
0:47:46and so we might a lot of software publicly or a way that well in
0:47:51particular its as i system distortion no
0:47:55interpretation and there's no way as its i leave work on the nist you make
0:48:01a small
0:48:02the
0:48:05install we entirely and finite state automaton
0:48:11and of which the of was actually at is to various virtual agents but also
0:48:17to all kinds of
0:48:18robots
0:48:20and of these is actually and
0:48:24problem thinking when
0:49:46is so actually that's a good a point
0:49:50because
0:49:53you to do it making dropped what is of a with of point able to
0:49:57recognize o where is looking
0:50:01at a much higher level of accuracy at any human would be
0:50:05and some people because they are just used explicitly also pointed and of course if
0:50:13you and not change its flexible kind of a reference i don't act
0:50:19in that particular we deal discourse features are just stuff
0:50:24for the illustration
0:50:26these boards and as a model somebody would you wanna pollard benefits
0:50:31of a it quickly a but also had this kind of behavior or we just
0:50:37got here we have the people off policy with a non contact with a
0:50:42of what some people show some people use pointing some people do not use pointing
0:50:49up by a nevertheless it will always a good usually do not point and not
0:50:55low so i wake up with
0:50:59had a information
0:51:02and because it meant to have a study are usually people believe you want has
0:51:08now and so they are really concentrating on this task
0:51:13and so that's probably why
0:51:15okay not
0:51:18at
0:51:19appreciate so much at a social okay so it is not bestow and the people
0:51:24actually makes no solution i would want to turn taking opening is realized release the
0:51:30turn taking a dialog was more efficient because it was clear out
0:51:36open dropped what the was expecting a user type that on a in terms of
0:51:42subjective evaluation considers did not to do the what was a behaviour or natural or
0:51:52a source what if
0:51:55men and i case it's really a task based
0:51:59scenario
0:52:00i it's not have time to show live video humans collaborating on data on to
0:52:07say a
0:52:08and we have
0:52:10some examples of human interaction is left not sure that the human robot interaction and
0:52:18syntactic in cases we had to human knowledge at that very well okay fact that
0:52:25we and various taking not
0:52:27for statistics was very close to take not they have to look at each other
0:52:32for data on the table
0:52:34and this was followed by a wide interest
0:53:30s two because actually correctly skewed documents we
0:53:36acquired they do so you have not one but what which looks like it would
0:53:43for this got what look like points and so intuitively the people of course top
0:53:51down in a very well may be justified condition in a more expressive no according
0:53:58to which i
0:54:00s two s p o
0:54:02more clearly
0:54:03and it was also used for people to a related to drop what so we
0:54:10brought one what to and
0:54:12it home and fist people well
0:54:15valley points a and a set you know why not fall at home we would
0:54:21just want to be a tweet that by i will
0:54:25and then is that okay as long as to what just calls it's okay it
0:54:30cannot close
0:54:31and
0:54:33and india and dropped what performance this is actually a to send out of are
0:54:39realised exactly how they had something called out and
0:54:45actually taken to do not what you like to have real data that you like
0:54:51a example somebody to take et al
0:54:55and it is sometimes they were also
0:54:58a p a surprise was one ladies she was
0:55:02the one hundred years the what affords you was really clear
0:55:07we call it can still and she's at
0:55:10it's just plastic i have a high round dropped were extracted by a strange but
0:55:18you're right with use
0:55:21i don't i brought lots of people find it easier or want to talk and
0:55:26what expressed
0:55:32details on thank you press
0:55:35i
0:56:17it's probably and that's that in
0:56:19because for example in
0:56:22since holders gain
0:56:25people actually intentionally shall was quite sure a particular emotional state whereas when regulate motion
0:56:37usually do not really
0:56:39think about it
0:56:43and that there's a
0:56:44that some quite some properties pulses just a few hundred and so that the general
0:56:51expression years with i just can't seem high location just looking at least is used
0:56:58machine learning always have a kind of evaluation able
0:57:04to recognize
0:57:07emotional and the state of what has actually you
0:57:11and what situation
0:58:00i believe that of the phase is quite important
0:58:05so i was in the presentation by a company that was really proud of their
0:58:10robot and did not have facial expressions it is not have just thinking
0:58:17and somebody in the audience that i don't understand the point is just a loudspeaker
0:58:25and what is the point so i think the
0:58:30the party as i want a back to face as important as well and what
0:58:35the
0:58:35now we have washed up to an issue
0:58:39okay we have this property of before and that was possible with the case apart
0:58:47head pose actually