0:00:15 | one of communication which is just as important as much |
---|
0:00:21 | namely nonverbal communication |
---|
0:00:25 | and in my i will discuss |
---|
0:00:28 | a how to enrich a |
---|
0:00:31 | the precise and useful function of computers with the human stability |
---|
0:00:37 | to show i mean of the message nonverbal behaviors |
---|
0:00:42 | also here in the collaboration between the woman and a robot to see they are |
---|
0:00:48 | not just collaborating this even the kind of close effective upon between down |
---|
0:00:56 | unease is actually of the for both of my of the research |
---|
0:01:01 | so that the problem i will be a structure is always select will first talk |
---|
0:01:07 | about |
---|
0:01:08 | at the recognition of social issues in human robot interaction but of course the technology |
---|
0:01:15 | is also useful for any kind |
---|
0:01:18 | of a solution to see there are also source signals in human |
---|
0:01:23 | interaction |
---|
0:01:25 | or in a man virtual agent interaction |
---|
0:01:29 | then average hold out that the generation of soldiers you in human robot interaction |
---|
0:01:36 | of course dropped what should not be just able to interpret the human signal |
---|
0:01:42 | it should also be able to respond with appropriately |
---|
0:01:47 | the next topic will be a dialogue management |
---|
0:01:51 | in a still virtual human robot in that should be able to talk about what |
---|
0:01:56 | the talent hiking |
---|
0:01:58 | also a pilot a solution is a mutual gaze back channels |
---|
0:02:04 | and to handle all these challenges we need of course a lot of data |
---|
0:02:10 | and so the last a part of my whole will be on how gradient learning |
---|
0:02:16 | for focus |
---|
0:02:18 | and at least is |
---|
0:02:19 | which will ease at fort of human by using |
---|
0:02:25 | scenarios to wise or |
---|
0:02:28 | so let's start with the recognition of a social use and human robot interaction |
---|
0:02:36 | so what kind of i c nodes are interested in |
---|
0:02:40 | basically in speech and facial expressions guys holster gestures body movements |
---|
0:02:47 | and approximate |
---|
0:02:49 | about not only the in are not only interested in the solution can use of |
---|
0:02:54 | an individual person |
---|
0:02:56 | but also in interaction patterns such as synchrony on maybe we interpersonal added you for |
---|
0:03:06 | example with the don't mean and a person |
---|
0:03:09 | all agent in interaction |
---|
0:03:11 | and i was also engagement |
---|
0:03:15 | so how engaged are the participants in than in a church |
---|
0:03:20 | so if you look at the literature the most attention has high to facial features |
---|
0:03:29 | so i don't want to go in detail here just mentioned |
---|
0:03:35 | and spatial i should according to the system which is used applied of to recognize |
---|
0:03:42 | but also channel eight facial expressions |
---|
0:03:45 | and the basic idea is to define such units |
---|
0:03:50 | i to characterize sosa emotional expressions |
---|
0:03:54 | others such as a hundred raised out of which is usually an indicator of the |
---|
0:04:00 | happiness |
---|
0:04:02 | also a lot of what has been spent on |
---|
0:04:06 | cool emotion recognition |
---|
0:04:09 | you're just for inspiration i show you how to signal of the same utterance baseball |
---|
0:04:16 | in different emotions |
---|
0:04:19 | you can see here the pitch point where it is quite a different |
---|
0:04:24 | depending on the emotion expressed |
---|
0:04:27 | and there has been some effort to find a wouldn't predictors of for vocal |
---|
0:04:34 | a motion it i would like to mention geneva minimalistic a set of features |
---|
0:04:42 | which was recently introduced and which actually titanium that's why would waste is also if |
---|
0:04:49 | you compare the two feature set |
---|
0:04:52 | consisting of a semblance analysis of features so if you like some will try to |
---|
0:05:00 | get of speech is a binary or deep neural network approaches us so it would |
---|
0:05:07 | be |
---|
0:05:08 | it put it here to compare arguably side it's a with the police is obtained |
---|
0:05:13 | and it by the chili that minimalistic a feature set |
---|
0:05:17 | so if you look at the literature you might get the impression okay you get |
---|
0:05:22 | very high recognition why the four emotions |
---|
0:05:26 | it even a little bit a scary a few wiped it to model and a |
---|
0:05:32 | test run in real words and mapping of the find out okay |
---|
0:05:37 | we started as sometimes even comes |
---|
0:05:41 | close up to four |
---|
0:05:43 | randomized the |
---|
0:05:46 | we sites |
---|
0:05:47 | so why is that so actually a previous research has focused on the analysis of |
---|
0:05:55 | equipments the basic emotions the motions |
---|
0:05:58 | that are quiet or extreme prototypical |
---|
0:06:03 | emoticons such as happiness that knows this task anger |
---|
0:06:08 | i emotional responses of what what's can usually not be mapped to a men's basic |
---|
0:06:16 | a motion so we see here for example use that's and because of the point |
---|
0:06:24 | i know |
---|
0:06:25 | that a post edit any woman and to create web why the happy in the |
---|
0:06:31 | interaction with the robot but it's not clearly |
---|
0:06:35 | with |
---|
0:06:36 | a couple of years ago |
---|
0:06:38 | a colleague of mine and one about clean a heated that we are interested in |
---|
0:06:44 | this study |
---|
0:06:46 | so actually they invest to investigate the motion recognition rate for acted emotions |
---|
0:06:53 | for read a motion and motions type in the with that of course the sound |
---|
0:06:59 | natural and it actually cost was just to distinguish between what they the motion no |
---|
0:07:06 | unknown motion so not the very difficult task |
---|
0:07:10 | and what i don't a motion is |
---|
0:07:12 | they got one hundred percent so help |
---|
0:07:16 | for an emotion |
---|
0:07:19 | is a little bit more natural than acted emotions they got eight percent |
---|
0:07:25 | which is okay but not really exciting because you know chances fifty percent if we |
---|
0:07:31 | just need to recognise what distinguish between |
---|
0:07:35 | mutual the motion and |
---|
0:07:38 | abortions |
---|
0:07:40 | and finally for obvious that of course scenario they just got a seventy percent |
---|
0:07:47 | so obviously systems developed under laboratory conditions of how perform poorly unless ordered |
---|
0:07:56 | a scenarios |
---|
0:07:58 | and the challenge is actually adaptive real time applications |
---|
0:08:05 | so usually if you look at the clutch if you look at speakers people obtain |
---|
0:08:11 | you will find out that most studies a offline start this so they take the |
---|
0:08:17 | call was |
---|
0:08:18 | and the calls |
---|
0:08:21 | is usually a pair |
---|
0:08:23 | and for example expressions that cannot be locally and in that one was the city |
---|
0:08:29 | emotional states |
---|
0:08:31 | a simple thing that our |
---|
0:08:33 | and the also a |
---|
0:08:36 | yup and of course the start from the assumption that whole process is a segment |
---|
0:08:42 | that in some way |
---|
0:08:44 | but the in so we don't life we also have a one handed noise the |
---|
0:08:50 | on the other data |
---|
0:08:53 | so we might as seen you information |
---|
0:08:56 | and also our pictures can only rely on previously seen data so we cannot |
---|
0:09:01 | look into the future |
---|
0:09:03 | but of course that the system has to at least one |
---|
0:09:06 | in a real time |
---|
0:09:09 | so the question is what can you about what they're |
---|
0:09:13 | and one other thing though we might consider would be or |
---|
0:09:18 | the context |
---|
0:09:19 | so if you know at the picture why matching in which emotional state |
---|
0:09:27 | a couple s |
---|
0:09:29 | so we have any idea of pos just people who don't know |
---|
0:09:35 | to compute a context |
---|
0:09:37 | any ideas what emotional state |
---|
0:09:40 | to go would be |
---|
0:09:44 | your quite what so usually actually in other people's say okay anyway distress |
---|
0:09:53 | its candidate |
---|
0:09:55 | i you are actually very good of size three that |
---|
0:10:00 | because it's a actually a jealousy |
---|
0:10:04 | i do actually the first cousins who actually of how it immediately a correct motion |
---|
0:10:09 | i nevertheless i don't say a four system |
---|
0:10:14 | and even able to a type of the facial action you want it in a |
---|
0:10:18 | perfect manner would have problems to find how without knowing the context |
---|
0:10:25 | that the least actually other channels |
---|
0:10:28 | so there are some |
---|
0:10:31 | recent research has been done actually to consider of the context and we science electro |
---|
0:10:38 | some improvement |
---|
0:10:40 | so a couple of years ago we investigate the agenda specific motion in the motion |
---|
0:10:48 | like recognition |
---|
0:10:49 | and so we were able to improve the recognition rates by training gender-specific a model |
---|
0:10:57 | and that's an approach was a done by christina format so actually she can see |
---|
0:11:05 | that the success and failure you don't |
---|
0:11:08 | it would during an application for example it student is heading a little time |
---|
0:11:13 | and that's to smiling a then interacting with the way application |
---|
0:11:19 | so probably the student is not a really happy to might be a to student |
---|
0:11:25 | does not try to system |
---|
0:11:27 | serious |
---|
0:11:28 | and i even though this approach is quite used for quite reasonable it has not |
---|
0:11:35 | be in a pick up so much |
---|
0:11:38 | so |
---|
0:11:39 | we arg one see that you got the dialogue behind me out of the virtual |
---|
0:11:43 | agent in the job and do you training scenario |
---|
0:11:47 | so for example when a job interview a task difficult questions about a the weaknesses |
---|
0:11:53 | of the candy that |
---|
0:11:55 | then it is also i had to something the pilot a |
---|
0:11:59 | a likely the motion that state |
---|
0:12:02 | and they are some of the time |
---|
0:12:05 | the to align actually a temp where context using bidirectional long short-term and you were |
---|
0:12:14 | networks |
---|
0:12:15 | so the context |
---|
0:12:16 | a might be a good option to oakland see that |
---|
0:12:22 | and not a maybe obvious thing to one see that use a multi modality here |
---|
0:12:28 | you can see you know what has bought cell where no it's just one |
---|
0:12:32 | and it to one of four two with a look at nearly to say so |
---|
0:12:37 | actually or it's |
---|
0:12:38 | for me it's not possible to recognize any difference in the face |
---|
0:12:43 | but if you look at the bottom |
---|
0:12:46 | you'll get a match the other pictures so on the right |
---|
0:12:50 | this moment is obviously why that's right i guess correct way no but not you |
---|
0:12:55 | very happy about l |
---|
0:12:57 | but nonetheless at least two from |
---|
0:13:00 | home or a demonstrated by a the face |
---|
0:13:05 | so |
---|
0:13:07 | multimodal fusion the data how |
---|
0:13:12 | that is an interesting to start by a team at all and a whole rate |
---|
0:13:16 | on my remote affect the detection |
---|
0:13:20 | and a study us to investigate that many studies that have outperformed the possibly with |
---|
0:13:28 | them at a study |
---|
0:13:29 | and radio show what |
---|
0:13:31 | that improvement how correlates with the naturalness of the calls which is actually that you |
---|
0:13:38 | so as a step four of them |
---|
0:13:41 | acted emotions |
---|
0:13:43 | you get quite high recognition rate and if you use multiple modalities |
---|
0:13:48 | so you can even get improvement of more than ten percent |
---|
0:13:52 | but for to difficult task namely spontaneous emotions |
---|
0:13:56 | the improvement left and i was then which is really bad you because the |
---|
0:14:02 | should we a hundred |
---|
0:14:05 | the user to additional devices just get less than five percent recognition rates |
---|
0:14:11 | and this assumption actually is that in the natural interaction |
---|
0:14:17 | a sheep are actually a of shall a motion in a you once is a |
---|
0:14:24 | menace or may not show a motion |
---|
0:14:28 | so more channels are the same express if a |
---|
0:14:32 | manner |
---|
0:14:33 | and first investigate a tractable |
---|
0:14:38 | assumption of we all looked at the call so we have we had a corpus |
---|
0:14:45 | i would hate affect just by the video and then just find audio |
---|
0:14:50 | and then you don't with that note i should mismatch on or |
---|
0:14:55 | and then we don't at the recognition rate and actually or when the annotations a |
---|
0:15:02 | mismatch |
---|
0:15:04 | and so the robot a match the low well |
---|
0:15:08 | like recognition weights |
---|
0:15:10 | so it will show you another example look at a woman the here |
---|
0:15:16 | so we have let's look at the second rate and here the woman shows a |
---|
0:15:21 | neutral face |
---|
0:15:23 | and the voice is happy |
---|
0:15:26 | and a little bit late error rates the other way round one is of the |
---|
0:15:30 | face looks at it but it's new well |
---|
0:15:34 | and i was sort of question is a watch a whole fusion approach to in |
---|
0:15:41 | such a situation |
---|
0:15:43 | and a yellow i sketch a potential solution |
---|
0:15:48 | so we my a so you show actually modality a specific recognizer might decide when |
---|
0:15:57 | to quantum leap you would |
---|
0:15:58 | and then interpolated |
---|
0:16:01 | and the y-axis interpolation or we get a better recognition besides |
---|
0:16:08 | so if you look at the literature so most of fusion approaches actually used in |
---|
0:16:13 | one this fusion approaches |
---|
0:16:16 | and synchronous fusion approaches are carried wise a it could situation of multiple modalities |
---|
0:16:23 | within the same time frame so for example people at a complete seven and eight |
---|
0:16:31 | just analyze the face |
---|
0:16:33 | and avoid over complete |
---|
0:16:36 | sentence |
---|
0:16:38 | i think owners fusion |
---|
0:16:41 | approaches |
---|
0:16:42 | actually a |
---|
0:16:44 | they a color rate and a modality is not bad all at different times |
---|
0:16:52 | so they do not assume that for example audio and video |
---|
0:16:57 | a expressed |
---|
0:16:59 | at the same time |
---|
0:17:01 | and therefore they are able to track channel to a simple nature of cops the |
---|
0:17:06 | other modalities so it's very important if you use the fusion approach and like to |
---|
0:17:13 | use of approach that is thus able |
---|
0:17:17 | two point see that and what a dependency is |
---|
0:17:21 | and it depends what if we wish of modalities but also |
---|
0:17:26 | the interdependence between modalities |
---|
0:17:29 | and that is only possible |
---|
0:17:31 | if you go for frame-wise recognition approach |
---|
0:17:36 | so we don't this approach either but a first year |
---|
0:17:39 | so we adopt at an event bayes fusion approach where we once you to events |
---|
0:17:46 | as an additional |
---|
0:17:48 | layout of at stretch between or sink nodes |
---|
0:17:51 | and higher-level emotional states |
---|
0:17:54 | even though the such as are allowed to have no |
---|
0:17:59 | or similar kinds of the social few |
---|
0:18:04 | and a in this way we were able to try to work how the temporal |
---|
0:18:10 | relationships between channel |
---|
0:18:12 | and learn when to provide information |
---|
0:18:15 | and also in case of some data on be seeing |
---|
0:18:20 | another approach is still a delivers a reasonable recognition besides |
---|
0:18:26 | so let's have a look at an exam well it's a simplified example it's over |
---|
0:18:31 | here we have audio and we have a facial expressions |
---|
0:18:36 | and the fusion approach my comma |
---|
0:18:41 | ways |
---|
0:18:43 | so what degree of whether it's |
---|
0:18:45 | and now let's assume for some reason the audio is no longer available |
---|
0:18:50 | and why interpolation |
---|
0:18:52 | we still a get a wide reasonable |
---|
0:18:56 | with is |
---|
0:18:57 | so we compare |
---|
0:19:01 | and number of those seen owners fusion approaches i think there is a fusion approaches |
---|
0:19:07 | and he went written of fusion |
---|
0:19:10 | and so for example of forty a synchronous fusion approaches so we call |
---|
0:19:16 | consider for example you wouldn't networks we also once it's not understand to |
---|
0:19:24 | take into account the temporal history of signals |
---|
0:19:29 | and also a bidirectional long on a short time you will networks |
---|
0:19:34 | a to be able to look in the future |
---|
0:19:38 | and to learn to tamper what history and what you can see here which is |
---|
0:19:43 | quite a whitening |
---|
0:19:46 | that or i think colin is a fusion the |
---|
0:19:49 | approaches actually up outperform a that are |
---|
0:19:53 | then one is a fusion approaches |
---|
0:19:56 | so a message i call it is if you fuses modalities |
---|
0:20:02 | usually do for approach that its first a able to point see that |
---|
0:20:07 | the can we wish of modality is |
---|
0:20:10 | but also in the dependency between modalities |
---|
0:20:15 | actually i mean actually i am i |
---|
0:20:35 | i don't i right away |
---|
0:20:48 | like a rational |
---|
0:20:54 | and actually two |
---|
0:20:58 | a postech development of |
---|
0:21:02 | social see their processing approaches for on-line recognition task |
---|
0:21:08 | we developed a framework which is called justice i for social signal |
---|
0:21:12 | in the quantization |
---|
0:21:14 | and this framework a synchronized with the modalities and it supports equal clear |
---|
0:21:21 | machine learning i nine words or offering a various kinds of machine learning |
---|
0:21:27 | approaches |
---|
0:21:28 | and |
---|
0:21:29 | we are able to actually or |
---|
0:21:34 | you with the natural at all modalities and sentences and whenever stands and uses and |
---|
0:21:41 | it becomes available |
---|
0:21:42 | my people write read will for it |
---|
0:21:45 | so we consider a motion capturing as you are the ones you doing of various |
---|
0:21:51 | kinds of |
---|
0:21:52 | i try to a stationary i like a smoothed by |
---|
0:21:56 | i traded |
---|
0:21:58 | and |
---|
0:21:59 | also a text is |
---|
0:22:02 | so basically all kinds of |
---|
0:22:05 | sensors that our company |
---|
0:22:07 | but way level |
---|
0:22:09 | so this was the top one or |
---|
0:22:12 | emotion recognition now i would like to come up to the as a side namely |
---|
0:22:18 | to the generation of those used by the robot |
---|
0:22:22 | it's nice that it is not sufficient to recognize the motion |
---|
0:22:25 | you also need to respond appropriately approaches a list apart appropriate responses |
---|
0:22:33 | and |
---|
0:22:36 | i guess it's a clear so why would nonverbal human signals a where we all |
---|
0:22:43 | and update not only express emotions but also edit you would |
---|
0:22:48 | intention |
---|
0:22:49 | also called only high interpersonal relations with the plate sample |
---|
0:22:55 | you are interested in talking to have a |
---|
0:22:59 | or not |
---|
0:23:00 | and nonverbal the three minutes kind of course also be you with |
---|
0:23:05 | other to understand be worth messages |
---|
0:23:10 | and in general will make the communication |
---|
0:23:12 | more natural implausible |
---|
0:23:15 | so we see that there are a couple of years ago a with and how |
---|
0:23:18 | well what |
---|
0:23:19 | of course the not what a leader is not how well |
---|
0:23:23 | and expressive case fetters so we have to look for after options and so we |
---|
0:23:29 | looked for action |
---|
0:23:31 | a tendency is |
---|
0:23:32 | which are related to motion selection and this is actually want to show before you |
---|
0:23:39 | start at so it's very common in |
---|
0:23:42 | in sports |
---|
0:23:44 | so you have proposed chat bots a person |
---|
0:23:48 | and to sports is not yet it but it's quite clear what is coming next |
---|
0:23:56 | and so we among a cisco we simulated actually tendencies such as approach |
---|
0:24:03 | panic attack and submission |
---|
0:24:05 | and it turned out that people were able to |
---|
0:24:08 | wait and is a ds the action can see is |
---|
0:24:13 | later we actually |
---|
0:24:16 | got a robot from hand mobile kind |
---|
0:24:19 | and here we actually try to simulate of facial |
---|
0:24:24 | expressions |
---|
0:24:26 | and you well kind of image that is all three start from the facial action |
---|
0:24:32 | coding system i mentioned |
---|
0:24:35 | well |
---|
0:24:36 | and a actually identify forty actually you would minutes of forty human of high |
---|
0:24:45 | for the question or can we simulate a report the action units |
---|
0:24:51 | and the for the robot |
---|
0:24:53 | so we write about the and a this the simulation of just seven hatch you |
---|
0:24:59 | wouldn't |
---|
0:25:00 | and these robot has a syntactic a skin and on the skin your house on |
---|
0:25:06 | modal is and the motors can move a and a beep or form eight |
---|
0:25:11 | a to form a the skin |
---|
0:25:13 | do we not only a little a two hour |
---|
0:25:16 | simulate the seven action units and at a question is whether this is enough and |
---|
0:25:21 | i show you show a video |
---|
0:25:24 | so it pretty what is in german with english as a high that's a lot |
---|
0:25:28 | is introduced focus about non-verbal signals it does not necessarily that you want to understand |
---|
0:25:35 | what is you start |
---|
0:25:38 | you can just a discussion of actually what the machine about information the machine have |
---|
0:25:44 | to be close she did not consider at stage the semantics of or utterances |
---|
0:25:50 | to about position is equal |
---|
0:25:55 | it can you see also would not test so it is equal to one can |
---|
0:26:04 | once will be given by its because i |
---|
0:26:10 | i |
---|
0:26:12 | yes i understand what is not one but also talk about |
---|
0:26:20 | but also useful what it is not quite often what it |
---|
0:26:25 | i don't think it's one is that all data that are not handled by a |
---|
0:26:31 | weighted sum of all |
---|
0:26:36 | is that it is not able to account for instance the hopefully it does come |
---|
0:26:42 | zero point all possible |
---|
0:26:45 | a problem with this is no i o |
---|
0:26:55 | in to compute so that you mentioned you can |
---|
0:26:59 | one for training |
---|
0:27:04 | okay just to show you that really does not can see that the semantics another |
---|
0:27:08 | example |
---|
0:27:12 | that's my |
---|
0:27:15 | schuller |
---|
0:27:20 | about done |
---|
0:27:22 | are you |
---|
0:27:25 | the system can do not work about online to one hundred fifty yet but not |
---|
0:27:38 | really constant talk detector e |
---|
0:27:44 | so just to show that you can't |
---|
0:27:47 | i have a conversation with emotional features are that's of course not over |
---|
0:27:53 | and a few well maybe we |
---|
0:27:57 | the of course a use a different from and to see so maybe we |
---|
0:28:05 | my a held at a it's not a over |
---|
0:28:09 | so what is the embassy still embassy it is an emotional response and its stance |
---|
0:28:16 | from the comprehension of emotional state of and also |
---|
0:28:22 | pairs |
---|
0:28:24 | and a so that the emotional state of the other person |
---|
0:28:29 | might be similar to your own emotions at but that's not have to be design |
---|
0:28:36 | a motion |
---|
0:28:37 | and embassy like what is either deeper such a of emotional state of an a |
---|
0:28:43 | set of parents and facilities is what we can more of a signal processing technology |
---|
0:28:50 | and it is also like well i guess so we don't think about the situation |
---|
0:28:55 | of the also use somehow |
---|
0:28:59 | need to know |
---|
0:29:00 | and of what at the outset person is feeling and why not start to oppose |
---|
0:29:05 | that it |
---|
0:29:07 | and also you are required to decide the how to respond to the ad suppose |
---|
0:29:14 | a motion |
---|
0:29:16 | so for example in the tutoring system |
---|
0:29:19 | if |
---|
0:29:20 | the student is in the very emotional state and depressed |
---|
0:29:24 | in a high it could be a disaster if the virtual agent would actually minimal |
---|
0:29:30 | a emotional state |
---|
0:29:32 | of the student because it might make a student |
---|
0:29:36 | moura |
---|
0:29:37 | depressed |
---|
0:29:38 | so |
---|
0:29:40 | it is actually a week what is a tree or |
---|
0:29:45 | this is a potential and want to not to show |
---|
0:29:49 | and we can realize kind of have say listen now |
---|
0:29:55 | so where we can see a motion we try to understand a emotional state |
---|
0:30:02 | and understanding and the motion state of the knots that appears in |
---|
0:30:08 | we could choose an internal reaction and that the question is should be external is |
---|
0:30:15 | a reaction and of what are two ways that i virtual you'll another |
---|
0:30:20 | examples was actually and how much will be |
---|
0:30:25 | simulated and appraisal a model |
---|
0:30:29 | a lot of the dialog alive will show you is actually is that of course |
---|
0:30:34 | so first and of what we do in this kind of a tie and all |
---|
0:30:39 | so we be able and motions |
---|
0:30:41 | a lot so |
---|
0:30:43 | we also a common to on the user's the emotions so the story will be |
---|
0:30:50 | a pilot a forgotten |
---|
0:30:52 | four point of medication |
---|
0:30:54 | and |
---|
0:30:56 | function and to see it is so we had to robert shows console a power |
---|
0:31:01 | of a button medication to increase awareness but it is doing it in a supplement |
---|
0:31:08 | no |
---|
0:31:09 | actually not what we are still at |
---|
0:31:13 | to a much |
---|
0:31:15 | and no overt so dropped what will show the some intention as |
---|
0:31:20 | while |
---|
0:31:21 | the palm down to the user |
---|
0:31:23 | so i will apply deal with |
---|
0:31:26 | but the video and what is actually a kind of amazing |
---|
0:31:32 | this is that it is disappointing fine edge while it is all |
---|
0:31:36 | here |
---|
0:32:07 | okay |
---|
0:32:12 | i |
---|
0:32:26 | a |
---|
0:32:39 | okay and a actually a to develop a better understanding of four emotions of users |
---|
0:32:47 | we are currently investigating how to combine the social signal processing of with affective as |
---|
0:32:54 | you rate of mind and cases actually what operation where is that happily an apart |
---|
0:33:01 | from the if i |
---|
0:33:03 | in a support |
---|
0:33:05 | so partly other developed a model of the whole and i don't know |
---|
0:33:10 | actually to simulate emotional behaviors |
---|
0:33:14 | and the basic idea is actually |
---|
0:33:17 | what |
---|
0:33:19 | have some and motion of stimulation and then change a ways of what do you |
---|
0:33:24 | recognise in terms of sources used |
---|
0:33:27 | actually matches and how well |
---|
0:33:29 | a simulation |
---|
0:33:31 | and the even type just a little bit of errors are |
---|
0:33:36 | we do not just once you to how a list one was so that |
---|
0:33:41 | and emotional state |
---|
0:33:43 | we also points you know how people |
---|
0:33:46 | actually show like to like they'll motions to show you an example |
---|
0:33:52 | so let's see that |
---|
0:33:55 | shape so if you are not regulated well you want a motion is either so |
---|
0:34:00 | the person who |
---|
0:34:03 | just flash they had a dollar |
---|
0:34:07 | and that this is the typical |
---|
0:34:10 | emotional expression |
---|
0:34:12 | we would expect |
---|
0:34:14 | and a people usually awake you like a motion is actually i like to better |
---|
0:34:20 | whole always the emotional state |
---|
0:34:24 | and or shy of the at different weights to like motions |
---|
0:34:32 | so avoided is one reaction but you put it text yourself so we have for |
---|
0:34:37 | example you say okay and i four and a but also at a gas a |
---|
0:34:43 | person |
---|
0:34:44 | and |
---|
0:34:46 | what you panacea actually other that we have a quite a different is no actually |
---|
0:34:53 | you know people might show depending on the way they regulate their motion and if |
---|
0:34:59 | you use a typical the machine learning approach actually |
---|
0:35:04 | to analyze distortion no |
---|
0:35:07 | you would never know i'm be able to find one motions |
---|
0:35:11 | because don't know |
---|
0:35:13 | how do people go back to rely on the emotional state so here is and |
---|
0:35:19 | have a price we have to discussion already yesterday |
---|
0:35:23 | maybe you can us |
---|
0:35:27 | machine learning approaches as like boxers recognise certain signals |
---|
0:35:33 | a fine tuning as some understanding actually |
---|
0:35:37 | a map |
---|
0:35:39 | to see that want to emotional states |
---|
0:35:41 | and it's even more important |
---|
0:35:44 | if the system has to respond what emotional state so matching a |
---|
0:35:49 | you talked to somebody on the on the guys not really understanding what's your problem |
---|
0:35:53 | you |
---|
0:35:54 | and i just at behaving like what we can you like well and |
---|
0:36:02 | a responding in a schematic a manner we were able shall |
---|
0:36:09 | and behaviour |
---|
0:36:10 | so it would like at the end of are also called me but all what |
---|
0:36:15 | is the weighted dialogue between a |
---|
0:36:18 | humans and or |
---|
0:36:20 | robots |
---|
0:36:21 | and only actually a client by dpi apply a job which can decide no |
---|
0:36:27 | on engagement and human robot in the action |
---|
0:36:32 | we looked at so |
---|
0:36:35 | signs of engagement in human robot a dialogue act of the amount of mutual gaze |
---|
0:36:41 | below a direct gaze turn taking |
---|
0:36:45 | and i just show you example the here it's a path of gain between a |
---|
0:36:52 | robot and you can result |
---|
0:36:54 | and to use that is where we hyped weight loss there's a so that the |
---|
0:36:59 | robot notes when it was is a loopy |
---|
0:37:03 | and in this specific as scenario |
---|
0:37:06 | all you know simulated directed gaze which is that kind of |
---|
0:37:12 | functional same |
---|
0:37:14 | so |
---|
0:37:15 | the robot is able to detect which all check |
---|
0:37:19 | the use that is a focusing on and this makes the interaction more efficient because |
---|
0:37:25 | there is no longer forced to describe |
---|
0:37:28 | o j lo detector i also implemented a hallway a scenario is or should gaze |
---|
0:37:36 | for distortion case actually voice |
---|
0:37:38 | do not have we deal function |
---|
0:37:41 | so i'd the dialogue was completely understandable without distortion we just wanted to know |
---|
0:37:48 | that's my to any difference |
---|
0:37:51 | so it just a very quickly |
---|
0:37:55 | we have a direct that a gaze assorted one who is the following two options |
---|
0:38:02 | and pointing the object or just looking at the object |
---|
0:38:06 | and for mutual gaze of both in that interval establish eye gaze |
---|
0:38:12 | the next thing what we realise was case is a disambiguation |
---|
0:38:18 | and a case applies disambiguation is interesting in so yes other people |
---|
0:38:25 | a few option which was then look away again |
---|
0:38:30 | so we need a different disambiguation approach |
---|
0:38:34 | that for example powerpointy then for example for pointing gestures when two point usually just |
---|
0:38:40 | point one and that's it you know what into the one time |
---|
0:38:45 | and so case is |
---|
0:38:47 | then we |
---|
0:38:49 | different |
---|
0:38:50 | and we also a real is |
---|
0:38:53 | so that some typical gaze behaviour is that you in a turn taking |
---|
0:38:58 | so speakers a new way usually from the addressee to indicate |
---|
0:39:05 | that they are for it to process of thinking about what to say next |
---|
0:39:11 | and also to show that they don't one and it should be a drop that |
---|
0:39:15 | and are typically at the end of an utterance the speakers |
---|
0:39:20 | low would you have a person |
---|
0:39:23 | because they want to know how we are suppose |
---|
0:39:26 | what the as opposed |
---|
0:39:28 | thinking about what has been set |
---|
0:39:31 | so basically |
---|
0:39:33 | we realize a shared folder of what follows the user's hand movements and drop what |
---|
0:39:40 | follows to users he's |
---|
0:39:42 | we will i social around eight |
---|
0:39:45 | so here to what i see and recognise this mutual gaze |
---|
0:39:49 | and finally to an eye dropper to make a nice is going to use that |
---|
0:39:54 | you tell |
---|
0:39:55 | and that will show you |
---|
0:39:58 | we deal |
---|
0:40:06 | so i decided to leave at the top and because i realise the top is |
---|
0:40:11 | much better roundy then the problem i did it is one of the |
---|
0:40:20 | how do okay |
---|
0:41:06 | the red wine it's of course ambiguous nothing more i k |
---|
0:41:12 | which man |
---|
0:42:19 | e |
---|
0:42:25 | thus |
---|
0:42:46 | again you know that |
---|
0:42:53 | and the we did an evaluation well this where |
---|
0:42:58 | and what we found was that actually of the object wanting was more effective than |
---|
0:43:04 | distortion grounding |
---|
0:43:07 | so the people were there are able to interact more efficiently with object a groundings |
---|
0:43:12 | of the dialogs were much shorter |
---|
0:43:15 | and the word lattice misconceptions |
---|
0:43:18 | and it's not distortion rounding error you not a improve the perception |
---|
0:43:23 | of the interaction |
---|
0:43:25 | which is of course appear because we spend quite some time one mutual gaze |
---|
0:43:32 | i one assumption is that people wear out that once waiting on the task instead |
---|
0:43:38 | of the social interaction with the robot |
---|
0:43:40 | and we might investigate if you have a more sources ask for example looking at |
---|
0:43:46 | family for both |
---|
0:43:47 | and the distortion gaze a might become more important |
---|
0:43:52 | and its assumption is a which we do not yet a try |
---|
0:43:57 | that some people are focusing more on the task in some without focusing more on |
---|
0:44:01 | the social interactions you can be classified like these |
---|
0:44:06 | and a specific people |
---|
0:44:08 | might appreciate the social gaze a more |
---|
0:44:12 | the analysis |
---|
0:44:14 | so have finally i would like a to come to reason a development is so |
---|
0:44:20 | we started one |
---|
0:44:23 | interactions in or dialogue |
---|
0:44:27 | and data from both sides of always |
---|
0:44:30 | do you make an interactive machine but also to machines in route a robot |
---|
0:44:36 | come do we she can interact |
---|
0:44:38 | the human |
---|
0:44:39 | so the o project which was already mentioned yesterday |
---|
0:44:45 | we have collected a corpus of which people a dialogue between |
---|
0:44:51 | you minutes |
---|
0:44:52 | and the dialogue has the in i'm not trying to label |
---|
0:44:58 | and we actually or integrated active learning and hope wait a litany |
---|
0:45:05 | in the annotation work so basically i think it is that the system actually |
---|
0:45:11 | this is which samples of the show you label |
---|
0:45:16 | pick the right relatedness and it also this is which sound shall be no actually |
---|
0:45:26 | a from like that at all |
---|
0:45:29 | and so one of which is forced to select examples |
---|
0:45:34 | for which a did not she and actually |
---|
0:45:38 | tie a low confidence |
---|
0:45:40 | and always that approach so we've well at the o to o |
---|
0:45:47 | make up the annotation process |
---|
0:45:51 | significantly more efficient |
---|
0:45:53 | and of these basically integration of the no one system is as i a system |
---|
0:45:59 | which i mentioned earlier |
---|
0:46:02 | and for the interactions that it is actually that you do an additive high main |
---|
0:46:10 | which is the essence of interruptions |
---|
0:46:13 | from |
---|
0:46:15 | called as a between a human |
---|
0:46:20 | it down |
---|
0:46:21 | so i to come to one compare emotion |
---|
0:46:25 | i think that a human robot in that capture cannot come we can treat here |
---|
0:46:32 | until a |
---|
0:46:34 | the problem of |
---|
0:46:35 | appropriate social interaction between robots and human |
---|
0:46:40 | for it |
---|
0:46:40 | in particular |
---|
0:46:42 | if a what is employed in |
---|
0:46:47 | the people it's how you |
---|
0:46:49 | and of what we need of course is a fully integrated into consisting of perception |
---|
0:46:54 | reasoning |
---|
0:46:55 | learning and responding |
---|
0:46:58 | and a particular it is at the moment is a big gap between the perceptual |
---|
0:47:04 | and the reason nine so the reasoning is |
---|
0:47:08 | kind of the net like that |
---|
0:47:10 | at the moment in favour of a black box the |
---|
0:47:14 | approaches |
---|
0:47:15 | which is useful for |
---|
0:47:18 | actually attended i o |
---|
0:47:20 | so we should use as such as laughter |
---|
0:47:24 | but after that so we need to reason about what |
---|
0:47:28 | actually distortion signal a marine |
---|
0:47:32 | and of course i know my disciplinary expertise is a |
---|
0:47:37 | necessary in order to emulate aspects of social intelligence that's why |
---|
0:47:41 | we call up with a lot we so |
---|
0:47:44 | psychologist |
---|
0:47:46 | and so we might a lot of software publicly or a way that well in |
---|
0:47:51 | particular its as i system distortion no |
---|
0:47:55 | interpretation and there's no way as its i leave work on the nist you make |
---|
0:48:01 | a small |
---|
0:48:02 | the |
---|
0:48:05 | install we entirely and finite state automaton |
---|
0:48:11 | and of which the of was actually at is to various virtual agents but also |
---|
0:48:17 | to all kinds of |
---|
0:48:18 | robots |
---|
0:48:20 | and of these is actually and |
---|
0:48:24 | problem thinking when |
---|
0:49:46 | is so actually that's a good a point |
---|
0:49:50 | because |
---|
0:49:53 | you to do it making dropped what is of a with of point able to |
---|
0:49:57 | recognize o where is looking |
---|
0:50:01 | at a much higher level of accuracy at any human would be |
---|
0:50:05 | and some people because they are just used explicitly also pointed and of course if |
---|
0:50:13 | you and not change its flexible kind of a reference i don't act |
---|
0:50:19 | in that particular we deal discourse features are just stuff |
---|
0:50:24 | for the illustration |
---|
0:50:26 | these boards and as a model somebody would you wanna pollard benefits |
---|
0:50:31 | of a it quickly a but also had this kind of behavior or we just |
---|
0:50:37 | got here we have the people off policy with a non contact with a |
---|
0:50:42 | of what some people show some people use pointing some people do not use pointing |
---|
0:50:49 | up by a nevertheless it will always a good usually do not point and not |
---|
0:50:55 | low so i wake up with |
---|
0:50:59 | had a information |
---|
0:51:02 | and because it meant to have a study are usually people believe you want has |
---|
0:51:08 | now and so they are really concentrating on this task |
---|
0:51:13 | and so that's probably why |
---|
0:51:15 | okay not |
---|
0:51:18 | at |
---|
0:51:19 | appreciate so much at a social okay so it is not bestow and the people |
---|
0:51:24 | actually makes no solution i would want to turn taking opening is realized release the |
---|
0:51:30 | turn taking a dialog was more efficient because it was clear out |
---|
0:51:36 | open dropped what the was expecting a user type that on a in terms of |
---|
0:51:42 | subjective evaluation considers did not to do the what was a behaviour or natural or |
---|
0:51:52 | a source what if |
---|
0:51:55 | men and i case it's really a task based |
---|
0:51:59 | scenario |
---|
0:52:00 | i it's not have time to show live video humans collaborating on data on to |
---|
0:52:07 | say a |
---|
0:52:08 | and we have |
---|
0:52:10 | some examples of human interaction is left not sure that the human robot interaction and |
---|
0:52:18 | syntactic in cases we had to human knowledge at that very well okay fact that |
---|
0:52:25 | we and various taking not |
---|
0:52:27 | for statistics was very close to take not they have to look at each other |
---|
0:52:32 | for data on the table |
---|
0:52:34 | and this was followed by a wide interest |
---|
0:53:30 | s two because actually correctly skewed documents we |
---|
0:53:36 | acquired they do so you have not one but what which looks like it would |
---|
0:53:43 | for this got what look like points and so intuitively the people of course top |
---|
0:53:51 | down in a very well may be justified condition in a more expressive no according |
---|
0:53:58 | to which i |
---|
0:54:00 | s two s p o |
---|
0:54:02 | more clearly |
---|
0:54:03 | and it was also used for people to a related to drop what so we |
---|
0:54:10 | brought one what to and |
---|
0:54:12 | it home and fist people well |
---|
0:54:15 | valley points a and a set you know why not fall at home we would |
---|
0:54:21 | just want to be a tweet that by i will |
---|
0:54:25 | and then is that okay as long as to what just calls it's okay it |
---|
0:54:30 | cannot close |
---|
0:54:31 | and |
---|
0:54:33 | and india and dropped what performance this is actually a to send out of are |
---|
0:54:39 | realised exactly how they had something called out and |
---|
0:54:45 | actually taken to do not what you like to have real data that you like |
---|
0:54:51 | a example somebody to take et al |
---|
0:54:55 | and it is sometimes they were also |
---|
0:54:58 | a p a surprise was one ladies she was |
---|
0:55:02 | the one hundred years the what affords you was really clear |
---|
0:55:07 | we call it can still and she's at |
---|
0:55:10 | it's just plastic i have a high round dropped were extracted by a strange but |
---|
0:55:18 | you're right with use |
---|
0:55:21 | i don't i brought lots of people find it easier or want to talk and |
---|
0:55:26 | what expressed |
---|
0:55:32 | details on thank you press |
---|
0:55:35 | i |
---|
0:56:17 | it's probably and that's that in |
---|
0:56:19 | because for example in |
---|
0:56:22 | since holders gain |
---|
0:56:25 | people actually intentionally shall was quite sure a particular emotional state whereas when regulate motion |
---|
0:56:37 | usually do not really |
---|
0:56:39 | think about it |
---|
0:56:43 | and that there's a |
---|
0:56:44 | that some quite some properties pulses just a few hundred and so that the general |
---|
0:56:51 | expression years with i just can't seem high location just looking at least is used |
---|
0:56:58 | machine learning always have a kind of evaluation able |
---|
0:57:04 | to recognize |
---|
0:57:07 | emotional and the state of what has actually you |
---|
0:57:11 | and what situation |
---|
0:58:00 | i believe that of the phase is quite important |
---|
0:58:05 | so i was in the presentation by a company that was really proud of their |
---|
0:58:10 | robot and did not have facial expressions it is not have just thinking |
---|
0:58:17 | and somebody in the audience that i don't understand the point is just a loudspeaker |
---|
0:58:25 | and what is the point so i think the |
---|
0:58:30 | the party as i want a back to face as important as well and what |
---|
0:58:35 | the |
---|
0:58:35 | now we have washed up to an issue |
---|
0:58:39 | okay we have this property of before and that was possible with the case apart |
---|
0:58:47 | head pose actually |
---|