0:00:15 | i'm not like a |
---|
0:00:17 | and my dog adviser a woman devilish and that he picked him |
---|
0:00:22 | and i want to talk about the user adaptation |
---|
0:00:25 | in dialogue system |
---|
0:00:28 | so most of the state of course |
---|
0:00:33 | dialogue system and most of the production dialogue system |
---|
0:00:36 | are adapting |
---|
0:00:39 | gender equality generic strategy |
---|
0:00:42 | so we have the same behavior |
---|
0:00:44 | for any user |
---|
0:00:46 | users |
---|
0:00:47 | and what's going to do is to learn one strategy |
---|
0:00:51 | for each of these users |
---|
0:00:55 | the propose a problem with a learning strategy from scratch |
---|
0:00:59 | is one to do some expression |
---|
0:01:04 | and expression lead to |
---|
0:01:08 | very bad |
---|
0:01:10 | performance is far directions |
---|
0:01:13 | so we want to design |
---|
0:01:17 | a framework |
---|
0:01:18 | which is |
---|
0:01:20 | i very good during the course starts of face |
---|
0:01:24 | and it must also be good during the as i said |
---|
0:01:29 | concept that interface |
---|
0:01:31 | so we propose |
---|
0:01:34 | for processes for user adaptation |
---|
0:01:36 | and who can composed of upright faces |
---|
0:01:41 | and it goes of this way |
---|
0:01:44 | so let's say we have a bunch of robot's we present think a dialogue system |
---|
0:01:49 | and each of these robots |
---|
0:01:52 | a learning strategy versus use a specific users |
---|
0:01:57 | and they also giver |
---|
0:01:58 | or the dialogue was done with the this user |
---|
0:02:04 | so all the knowledge of this well but |
---|
0:02:08 | is represented |
---|
0:02:09 | by the dialogues |
---|
0:02:11 | so we want to elect |
---|
0:02:15 | some representatives |
---|
0:02:16 | all the database |
---|
0:02:18 | and for example gives a little bit and i did one |
---|
0:02:22 | and it's a it's a novel we have a target user |
---|
0:02:25 | and we don't have a system |
---|
0:02:27 | two dialogue you'd of these target user so we want to design a system from |
---|
0:02:31 | scratch |
---|
0:02:33 | and what's going to do is to transfer the knowledge of one of the we |
---|
0:02:37 | present that you to the system |
---|
0:02:39 | so i'd first we want to select the best representative to dialogue we have or |
---|
0:02:44 | target user input |
---|
0:02:47 | and we will try it should be represent the t one by one |
---|
0:02:51 | and at the end |
---|
0:02:52 | we select the better a dialogue system which is blue lines the you use |
---|
0:02:58 | so now we natural for all the knowledge |
---|
0:03:01 | to the new system |
---|
0:03:03 | so let's say we have |
---|
0:03:06 | scrunch system |
---|
0:03:08 | and we're gonna know the strategic thanks to the knowledge transfer and also |
---|
0:03:15 | we all the dialogue don't during the source selection face |
---|
0:03:19 | so we gonna use this new this can they have system |
---|
0:03:23 | to their with this user |
---|
0:03:25 | and we collect more dialogues |
---|
0:03:28 | and then we can learn new system morse a more specialised |
---|
0:03:32 | to this target user |
---|
0:03:34 | and we repeat this process and to be which |
---|
0:03:37 | a very as busy writers the spectral is |
---|
0:03:41 | general system to be a target user |
---|
0:03:46 | so in the end we are then you |
---|
0:03:48 | and you wanna target dust into the two sources |
---|
0:03:53 | so i will detail each of these a face |
---|
0:03:56 | so the sources are dialogue manager |
---|
0:04:00 | so they have manager components of dialogue systems |
---|
0:04:04 | and this manager take as input a repetition activities |
---|
0:04:09 | for example i would like to book a flight suit on then |
---|
0:04:13 | and the dialogue manager with the connection |
---|
0:04:16 | for example a good field or a good nine |
---|
0:04:21 | and the usual way to design their manager |
---|
0:04:27 | is to a task than a reinforcement learning problems |
---|
0:04:31 | so we first but only programs |
---|
0:04:35 | and you with one engines |
---|
0:04:38 | interaction with no agreement |
---|
0:04:40 | so for example are agent is a dialogue manager |
---|
0:04:44 | and the environment will be a target user |
---|
0:04:48 | so the engine can take |
---|
0:04:52 | interaction |
---|
0:04:53 | and the environments we'll react |
---|
0:04:57 | and we can also it's a reaction |
---|
0:05:01 | so prime is an observation and we can also are but we are we want |
---|
0:05:08 | so amp right |
---|
0:05:09 | and even in this observation and no also the action taken |
---|
0:05:14 | be an agent can a date |
---|
0:05:17 | it's a joint state |
---|
0:05:19 | so we got here we go to a far from is to a sprite |
---|
0:05:24 | so we conducted that |
---|
0:05:27 | or the knowledge of the environment is contain |
---|
0:05:31 | in the top l is a |
---|
0:05:35 | a sprite and |
---|
0:05:37 | our prior |
---|
0:05:39 | so this is |
---|
0:05:41 | the mentioning you know reinforcement learning |
---|
0:05:43 | so we have knowledge of the environment |
---|
0:05:47 | taking the form of the samples |
---|
0:05:49 | and we want to design a good the strategy for the nao manager |
---|
0:05:56 | and have used that this is good policy so this is a function mapping |
---|
0:06:02 | states to a collection |
---|
0:06:04 | and we want to find the optimal policy |
---|
0:06:06 | so the optimal policy |
---|
0:06:08 | is a policy which maximizes |
---|
0:06:10 | at the community we weren't |
---|
0:06:12 | during in the direction |
---|
0:06:14 | between the dialogue manager and the target user |
---|
0:06:19 | so no |
---|
0:06:22 | i of the there is an equivalency between the dialogue manager a time stamp |
---|
0:06:26 | robots and a policy |
---|
0:06:28 | so we want to find the best |
---|
0:06:32 | what d c two represents all the database |
---|
0:06:36 | so this is this will selection phase |
---|
0:06:39 | and we introduce in this is the main contribution of the paper |
---|
0:06:43 | we introduce bodysuit raven distance |
---|
0:06:47 | so this is a matrix |
---|
0:06:48 | which computes |
---|
0:06:50 | the have you or differences between what is |
---|
0:06:54 | so |
---|
0:06:54 | we some state and we look at which edge action is taken |
---|
0:07:00 | in a each of these distinct |
---|
0:07:03 | and for example one can see that the third one |
---|
0:07:07 | is very close to populate one |
---|
0:07:10 | and the yellow is very different to the to the little |
---|
0:07:15 | so one can see this at least relevant distance |
---|
0:07:19 | as a binary vector |
---|
0:07:22 | and where the ones |
---|
0:07:25 | we present the action taken in a given state |
---|
0:07:29 | so for example |
---|
0:07:31 | we will but take these actions |
---|
0:07:34 | and the been every vector will look like |
---|
0:07:37 | and it if we combine of using every vector |
---|
0:07:41 | to the gender and all |
---|
0:07:43 | we have a unique button see |
---|
0:07:45 | with the which is greater |
---|
0:07:47 | train a distance |
---|
0:07:49 | so this allow us to use a clustering algorithm called k-means |
---|
0:07:56 | so can means will give our or the skewed or a dialogue manager |
---|
0:08:02 | as clusters |
---|
0:08:04 | and since we want to represent the gmm |
---|
0:08:07 | we will have to learn one policy by clusters |
---|
0:08:12 | so we give a working knowledge of each cluster and we learned policy with that |
---|
0:08:18 | but we can also use an of our algorithm |
---|
0:08:21 | code that come into its |
---|
0:08:22 | and i'm in the winter thanks to the police drama distance |
---|
0:08:26 | we finish directly free representative |
---|
0:08:31 | okay so no we want to select the best |
---|
0:08:34 | policy to dialogue with the target user |
---|
0:08:39 | so this is association or |
---|
0:08:41 | so for that we cannot use a bounded algorithm |
---|
0:08:44 | corn use into one |
---|
0:08:45 | so usually one will test |
---|
0:08:48 | each of the representative one by one time |
---|
0:08:51 | so you would deal with when one and two score is to with a one |
---|
0:08:56 | and then the with one |
---|
0:08:58 | and no is the next dialogue other the next system that the user will dialogue |
---|
0:09:04 | with |
---|
0:09:05 | is as a system which maximize the be value so |
---|
0:09:09 | now we will deal with the blue one |
---|
0:09:12 | and the u w is to the best |
---|
0:09:15 | so we keep the earring with the blue one |
---|
0:09:17 | and to which a very but school |
---|
0:09:20 | and at these points |
---|
0:09:22 | the red system at the better value so we switch or robots |
---|
0:09:27 | and we would be this process and to me which are maximum timing it |
---|
0:09:31 | for example one hundred the time step |
---|
0:09:36 | and so we know that on this is as the system or maximizing the them |
---|
0:09:42 | so the point of using a c d one is that the summaries and take |
---|
0:09:46 | into account the high variability |
---|
0:09:49 | of the dialogs |
---|
0:09:53 | okay so knowledge transfer the knowledge of this you know to a menu system |
---|
0:09:59 | so is also face |
---|
0:10:01 | so let's saying we have to the edge of samples the source image and the |
---|
0:10:05 | target image |
---|
0:10:07 | and we want to remove |
---|
0:10:09 | where the sample from the source badge |
---|
0:10:11 | already played present in the target image |
---|
0:10:14 | so for that we use those two base |
---|
0:10:18 | so this is a filtering algorithm |
---|
0:10:20 | it will consider their each some part of the source of h |
---|
0:10:24 | so let's say we start with this one |
---|
0:10:26 | and it would what's there are some kind with the same action |
---|
0:10:30 | so these two |
---|
0:10:32 | and sees us israel states is very different to the red state in the two |
---|
0:10:37 | states |
---|
0:10:38 | we can have a the source better |
---|
0:10:40 | to the funeral image |
---|
0:10:43 | no we because the obvious something |
---|
0:10:46 | and we can see that the light red state is very close to the right |
---|
0:10:51 | state |
---|
0:10:52 | so we don't at this simple to the pitch |
---|
0:10:55 | and we keep the we continue this for each sample of just a bench |
---|
0:11:01 | and in the end that we have but target image |
---|
0:11:05 | and we will use it really was this |
---|
0:11:08 | for learning a new policy |
---|
0:11:11 | so the other so that only |
---|
0:11:13 | is don't thanks to we the did you |
---|
0:11:17 | so if you did you is a reinforcement learning algorithm which take of any goods |
---|
0:11:23 | a bunch of samples |
---|
0:11:25 | and it would computes the optimal policy for this some pairs |
---|
0:11:31 | to think issue is |
---|
0:11:33 | and i resign coming from fitted value iteration and this specific algorithm can also from |
---|
0:11:41 | body recognition |
---|
0:11:42 | and value iteration is a very famous algorithm to solve a markov decision processes |
---|
0:11:51 | so if we combine as a filtering in the running |
---|
0:11:54 | one can see that we learn a |
---|
0:11:58 | a system |
---|
0:11:59 | which is a mix between when diesel together and the real users |
---|
0:12:04 | so we're gonna use this new |
---|
0:12:07 | this new system |
---|
0:12:09 | to dialogue now |
---|
0:12:11 | we target user |
---|
0:12:13 | so we a new dialogue to the target bench |
---|
0:12:16 | and you can see that the free software that at the bench are very similar |
---|
0:12:20 | to the sampling this was image |
---|
0:12:23 | so in the enter |
---|
0:12:25 | it remains only is about as a as a sample from the target image |
---|
0:12:30 | so when we going out on the then you put it |
---|
0:12:34 | we will on the very special specialised system to this a target user |
---|
0:12:41 | so this is the overall the additional process for |
---|
0:12:46 | for users |
---|
0:12:48 | and what we want to test are |
---|
0:12:51 | our framework on some experience |
---|
0:12:54 | so we gonna uses the negotiation that okay |
---|
0:12:57 | so we focused on a negotiation because |
---|
0:13:01 | we have two actors |
---|
0:13:04 | having a different be have your |
---|
0:13:07 | so we want to adapt to this year |
---|
0:13:09 | so in the negotiation there again you want to appear |
---|
0:13:12 | and they are given some time slots |
---|
0:13:17 | and preferences |
---|
0:13:18 | for each time slot |
---|
0:13:20 | and averaged around a |
---|
0:13:22 | each agent |
---|
0:13:25 | we're the proposed a slot |
---|
0:13:28 | for example kenny proposed a this drinks but |
---|
0:13:32 | and the wheel but we shoes and propose it's one utterance but |
---|
0:13:37 | so since as negation again is an obstruction of a yellow |
---|
0:13:42 | dialogue we introduced a noise |
---|
0:13:45 | in communication channel |
---|
0:13:47 | and the form of switching sometimes but so for example we replace the previous times |
---|
0:13:54 | right with the yellow one |
---|
0:13:56 | and can you will result we will assign a new information |
---|
0:14:01 | as a form of an automatic speech recognition score |
---|
0:14:06 | and you want this information it can continue the dialogue |
---|
0:14:10 | are you can ask to deal the origin to repeat the proposition |
---|
0:14:14 | or you can and does the data |
---|
0:14:16 | so for example you yes to repeat |
---|
0:14:21 | and be able but repeats |
---|
0:14:24 | and at some points |
---|
0:14:26 | can you can accept the proposition |
---|
0:14:29 | are you can also deny and the dialogue |
---|
0:14:34 | in the end of the dialogue where the users are rewarding |
---|
0:14:39 | we have a score |
---|
0:14:41 | and this court is functions you'd |
---|
0:14:44 | with the |
---|
0:14:46 | we are all the time slot and read |
---|
0:14:50 | so i four went to say that the point of the game |
---|
0:14:53 | is to final than agreements |
---|
0:14:56 | between at experts |
---|
0:14:58 | so can you really ugly well the less buttons here the all but see so |
---|
0:15:03 | that estimates is |
---|
0:15:04 | is smaller |
---|
0:15:07 | so now we want to test the this again |
---|
0:15:10 | we use the and there is a under the user interacting with the system so |
---|
0:15:15 | we designed a similar to users |
---|
0:15:17 | with a very difference profiles |
---|
0:15:21 | and so we have for example the determinized each user |
---|
0:15:24 | we will you will |
---|
0:15:26 | proposed is a certain slots in decreasing order |
---|
0:15:30 | and we have also this one now proposing instance |
---|
0:15:34 | taking a random actions |
---|
0:15:37 | this wonderful whereas propose it's a base the best start |
---|
0:15:42 | and this one accept as soon as possible and finally |
---|
0:15:46 | this one and the dialogue as soon as possible so this is very different be |
---|
0:15:52 | a if you are and we want to adapt to these vehicles |
---|
0:15:55 | we also design you want models |
---|
0:15:59 | so each one model is |
---|
0:16:01 | is a model of you man thanks to everything off |
---|
0:16:08 | one and read the dialogue by men so for you man |
---|
0:16:13 | and we model it is these |
---|
0:16:17 | is that so we used results |
---|
0:16:19 | with a k-nearest neighbor algorithm |
---|
0:16:23 | and you can scenes in the table |
---|
0:16:25 | the distribution of action for a feature we really humans |
---|
0:16:31 | so you can lead to that we'll and at x are very similar |
---|
0:16:36 | and you go and no one are pretty difference |
---|
0:16:41 | so now we want to design the system |
---|
0:16:43 | which we don't directly with this these results |
---|
0:16:48 | so that won't have the same action and the of the users to simplify the |
---|
0:16:52 | design |
---|
0:16:54 | as a set of function is received restricting |
---|
0:16:58 | and we don't know as we so previously this system with a few |
---|
0:17:04 | and a morse wire and that's one really agrees them to do some exploration |
---|
0:17:10 | so the in this tell the isn't sure of the dialogue system the dialog manager |
---|
0:17:17 | is a actually to commit a combination of the costs of the automatic speech sure |
---|
0:17:22 | regression recognition score |
---|
0:17:23 | and also the number of the |
---|
0:17:25 | of that are during the key |
---|
0:17:29 | so before test susie |
---|
0:17:32 | men framework we want to show that running one system by a user is a |
---|
0:17:39 | good thing |
---|
0:17:40 | so here we have a bunch of system so v s u one two three |
---|
0:17:45 | extra and each of the system learning strategy |
---|
0:17:49 | with the this users so obviously when we don't know |
---|
0:17:53 | the strategy against a pu one |
---|
0:17:56 | and you can not is that the board values |
---|
0:18:00 | actually indicate that |
---|
0:18:02 | as a bit so the bit the system to dialogue we've a given user is |
---|
0:18:07 | the system we should on the strategy |
---|
0:18:09 | we this user |
---|
0:18:10 | so there is a real we need to adaptation |
---|
0:18:16 | we can share the same with you'll and when they're users |
---|
0:18:19 | the t and the difference is that well if you |
---|
0:18:23 | and actually it is the especially for is a screen and thus use alex |
---|
0:18:30 | the |
---|
0:18:31 | the both |
---|
0:18:33 | one point or seventy four in one way or seventy three |
---|
0:18:38 | a very close and you can do sources and the thing for the line we |
---|
0:18:43 | will |
---|
0:18:45 | so |
---|
0:18:46 | no we can test the main framework for adaptation |
---|
0:18:50 | so for that we introduce two new methods |
---|
0:18:55 | one using |
---|
0:18:57 | and without the scratch so is quite sure it's just go down just learn to |
---|
0:19:01 | make the system from scratch without |
---|
0:19:04 | transferring in english |
---|
0:19:05 | and the other one is a limited so this is the generic |
---|
0:19:09 | generic midi the |
---|
0:19:11 | each way on the policy we all the knowledge of the database |
---|
0:19:14 | so we generate too slow system database one for the user's stability and once for |
---|
0:19:20 | the human model users |
---|
0:19:22 | and each new system is it on things to |
---|
0:19:25 | we one that thousand two hundred dialogues |
---|
0:19:29 | and each means that there is this two |
---|
0:19:33 | we to two hundred dialogues |
---|
0:19:37 | so for simulated users |
---|
0:19:40 | alternate alternative is intent on the other show a significant better result than i don't |
---|
0:19:45 | know and scratch for the two metrics |
---|
0:19:48 | the scores and task completion |
---|
0:19:51 | but in an upper hand for your money they results |
---|
0:19:54 | our method are it is better |
---|
0:19:56 | but not that much and |
---|
0:19:59 | the reason for that is negotiation that again is a two simple for humans |
---|
0:20:04 | and i actually most of the human have the same behavior on the game |
---|
0:20:10 | so there is no points of learning |
---|
0:20:14 | i don't that you strategy |
---|
0:20:15 | since all the people have the same behavior |
---|
0:20:21 | so we have to conclude we provide the framework for a user adaptation |
---|
0:20:26 | and the we introduce a prescription distance which is a way to |
---|
0:20:32 | compute the everywhere differences |
---|
0:20:35 | and we validate the framework on both |
---|
0:20:40 | this unit user and human with a user setup |
---|
0:20:43 | and finally we show that the overall |
---|
0:20:47 | dialogue quality is a hands |
---|
0:20:50 | based on two metrics of the task completion and the score |
---|
0:20:55 | so thank you |
---|
0:21:23 | i wasn't sure what you squirt for your cross comparison |
---|
0:21:28 | i we want to see this way |
---|
0:21:33 | next table so what is numbers and what's good |
---|
0:21:39 | well which |
---|
0:21:42 | each for represents the score |
---|
0:21:44 | of each is then given the user of the whole |
---|
0:21:48 | so the system is |
---|
0:21:50 | and the other thing we the each user |
---|
0:21:54 | so |
---|
0:21:54 | so for example a dispute to have a score of zero point forty four |
---|
0:22:01 | we the b one |
---|
0:22:03 | what is that score |
---|
0:22:05 | score is a score is |
---|
0:22:07 | is the mean we while of is a diagonal |
---|
0:22:10 | g i at the end of the dialogue there is a we want okay and |
---|
0:22:13 | we do some you know g though |
---|
0:22:15 | on the register maximum rate is the maximum score |
---|
0:22:22 | yes actually it's |
---|
0:22:24 | it's too |
---|
0:22:26 | higher |
---|
0:22:28 | sorry the higher better that's |
---|
0:22:48 | okay |
---|
0:22:49 | the question could you |
---|
0:22:51 | more details about a reinforcement learning |
---|
0:22:56 | i e c |
---|
0:23:00 | the key |
---|
0:23:02 | you want you are |
---|
0:23:13 | i |
---|
0:23:15 | speaker once again |
---|