0:00:14 | only one source statistical |
---|
0:00:19 | and like to start the third and final invited talk |
---|
0:00:24 | so we decided to use to actually you know |
---|
0:00:31 | calculated from computer science and mathematics from the university l right |
---|
0:00:36 | and since then she's been at cambridge university course you received your m fill that |
---|
0:00:41 | her phd in statistical dialogue systems research associate and most recently has become a lecture |
---|
0:00:48 | it's open dialogue systems in the department of engineering |
---|
0:00:52 | and she is also a well like to the fellow of one of the colleges |
---|
0:00:55 | the cambridge university |
---|
0:00:59 | she's extremely well known i'm short everyone in this community because she's very well published |
---|
0:01:03 | including a number |
---|
0:01:04 | a for winning papers including classic style and she's coauthor of one of the nominees |
---|
0:01:11 | of our for nominated papers at this six dial and have |
---|
0:01:15 | after her talk if you still wanna do you can even more into her and |
---|
0:01:20 | her colleagues research they have to posters at the afternoon poster session this afternoon |
---|
0:01:26 | please welcome relief |
---|
0:01:36 | and everybody here to sling |
---|
0:01:40 | thank you don't really comes address is there was lining of getting the ski boats |
---|
0:01:45 | here right |
---|
0:01:46 | i |
---|
0:01:48 | once in their sick not really clears one big family and if a family member |
---|
0:01:53 | to do something kind of signal |
---|
0:01:55 | so |
---|
0:01:57 | thank you very much |
---|
0:01:58 | a i will be talking about |
---|
0:02:02 | soundness there are needed |
---|
0:02:04 | that's what we're a building next conversation |
---|
0:02:08 | and a deep learning can help us along that they are in some effort that |
---|
0:02:13 | we've done between the dialysis this group in cambridge to achieve that |
---|
0:02:20 | while i'm sure that we all agree |
---|
0:02:23 | spoken conversation and in particular dialogue is one of the most natural rates of exchanging |
---|
0:02:30 | information between q |
---|
0:02:32 | we can be a book and be able to talk about what we just right |
---|
0:02:39 | machines of the other hand there are very scoring huge amount of information okay not |
---|
0:02:45 | so good share this information bit as in actual in human like right |
---|
0:02:50 | so i'm sure and get lots of companies will have the virtual personal assistant sorry |
---|
0:02:56 | privately locked loop and how they generate billions of calls |
---|
0:03:01 | then the current models are very unnatural |
---|
0:03:05 | no in domain and frustrating users |
---|
0:03:09 | so in the research question that one to address is |
---|
0:03:12 | how to be a continuous labeling |
---|
0:03:16 | dialogue system capable of natural conversation |
---|
0:03:23 | machine learning is very attractive for solving this task |
---|
0:03:28 | one of the machine learning very high if i had to summarize machine learning the |
---|
0:03:33 | just three words this would be data |
---|
0:03:35 | model and prediction |
---|
0:03:38 | so what they are in our case |
---|
0:03:41 | okay is simply driver |
---|
0:03:43 | or some parts of dialogues like |
---|
0:03:46 | transcribed speech on a okay user intents what providing user feedback |
---|
0:03:55 | the model is the underlying statistical model that lets us explain a time we use |
---|
0:04:01 | of i've never directly model |
---|
0:04:05 | once we train the model |
---|
0:04:07 | we can make predictions |
---|
0:04:09 | what is unusable |
---|
0:04:11 | what to say back |
---|
0:04:13 | to the user |
---|
0:04:17 | that was |
---|
0:04:18 | you just the building statistical dialogue systems has some three d so you assume this |
---|
0:04:25 | the following structure |
---|
0:04:28 | i guess is called dialog system consists of speech understanding unit |
---|
0:04:32 | no management unit |
---|
0:04:34 | and speech generation you |
---|
0:04:37 | but it user speaks their speech is being recognized very speech recognizer |
---|
0:04:43 | and a system a coherent state tracker that produce |
---|
0:04:47 | dialog states it's of the |
---|
0:04:50 | that is currently |
---|
0:04:53 | these a policy makes a decision what to say back to the user |
---|
0:04:59 | and very often are more or less nature of some kind of evaluated which vibrates |
---|
0:05:06 | how good base decision well |
---|
0:05:09 | second experiment we generate your |
---|
0:05:12 | which reduces the textual output that is then presented to the user like text-to-speech synthesizer |
---|
0:05:20 | i don't mind all these model of modules |
---|
0:05:24 | is the ontology structured representation of the database that the dialogue system can talk so |
---|
0:05:31 | this is |
---|
0:05:32 | the structured it's obvious you |
---|
0:05:35 | in goal oriented that exist |
---|
0:05:38 | that is not to wear in the last |
---|
0:05:42 | okay on automatic speech recognizers use size |
---|
0:05:46 | some researchers go as far as saying that are known to reach the performance of |
---|
0:05:52 | organs you want |
---|
0:05:54 | in a speech recognition |
---|
0:05:56 | i wouldn't say that but i would just like to point out that queries you |
---|
0:06:01 | want more to be done |
---|
0:06:03 | in the rest of the structure |
---|
0:06:05 | of a spoken dialogue system |
---|
0:06:08 | no this modular structure |
---|
0:06:12 | there is no loss of information between these modules |
---|
0:06:17 | and then the question is what can you to alleviate this loss of information |
---|
0:06:23 | what can you want you |
---|
0:06:26 | actually i |
---|
0:06:27 | probability distributions between these modules |
---|
0:06:31 | to our help alleviate the problem of loss of information |
---|
0:06:35 | and the other when you do that |
---|
0:06:38 | you the that's the map decoder and the state tracking becomes the belief tracker module |
---|
0:06:46 | it produces a distribution over possible |
---|
0:06:49 | a dialog states are described in |
---|
0:06:54 | but before i go further into explaining how these systems where it's just take a |
---|
0:07:01 | step back and see how many systems are currently prissy by its you are there |
---|
0:07:07 | is |
---|
0:07:12 | i have a personal assistant or microphone and a user pretty much everything i think |
---|
0:07:17 | at what times map like tomorrow and it says you're like to note is that |
---|
0:07:22 | there it is a p a shown with importance of doing a key and it |
---|
0:07:27 | says |
---|
0:07:28 | well as to not be a what time is tim's waiting to more you're waiting |
---|
0:07:33 | is it five you know into a knocking tim's wearing that is you weighting is |
---|
0:07:37 | the type ea in okay never mind we communicate by data alarms that four five |
---|
0:07:42 | am |
---|
0:07:43 | see it takes care of pretty much everything that i could probably due my so |
---|
0:07:47 | but just kind of or for having someone to forming it did not however prevent |
---|
0:07:51 | me from spilling coffee all over my trousers just before the ceremony i set up |
---|
0:07:56 | at a i've got a problem and everything up |
---|
0:07:58 | you're waiting using one hour and twenty five minutes james wedding i've got coffee all |
---|
0:08:04 | over my trousers but has there is a nineteen ninety three stop okay intermediate for |
---|
0:08:08 | l d a r any department stores double your favourite department store is no but |
---|
0:08:13 | i don't think there do not p a find me a shot into a not |
---|
0:08:18 | sure that |
---|
0:08:19 | okay |
---|
0:08:21 | do any stores after answers that's interesting question okay "'cause" call me a taxi to |
---|
0:08:26 | the centre i'm sorry i can help axes into an okay |
---|
0:08:30 | what you mean you can call it actually i'm getting a no i'm sorry i |
---|
0:08:33 | cannot help you with that and so how am i supposed to get to the |
---|
0:08:36 | weighting you're waiting using one hour and twenty for the last time came sweating your |
---|
0:08:41 | wedding using one our work and twenty four minutes |
---|
0:08:44 | well thanks be a you're really helping me out it is a should be thinking |
---|
0:08:49 | you |
---|
0:08:52 | so it should just a and resembles the actual personal assistants real virtual is really |
---|
0:08:57 | going through that |
---|
0:08:59 | okay so we'll |
---|
0:09:01 | so let's try to problems that we address |
---|
0:09:05 | the most obvious one from this means here is obviously phonemic awareness this is a |
---|
0:09:11 | personal assistant was completely unaware of the user emotion and their state but there are |
---|
0:09:18 | some things we need to address before that |
---|
0:09:21 | so that the problems is closed it's |
---|
0:09:25 | sure that can still not scale |
---|
0:09:27 | and often maybe tool for long time to dialogue system it struck context |
---|
0:09:37 | this problem is that each voice all |
---|
0:09:41 | action of response is not pretty good |
---|
0:09:45 | and the reason for that is the learner response choose between the very small set |
---|
0:09:51 | of actions |
---|
0:09:52 | and think to build an actual conversation unless thinking a lot of our systems to |
---|
0:09:58 | choose between of a wide variety of actions |
---|
0:10:04 | and finally systems their own or stick to different user needs |
---|
0:10:08 | and this can be interpreted in many different raise but it is clear that we |
---|
0:10:13 | need to more the user back there |
---|
0:10:15 | if we want to achieve a better dialogue system |
---|
0:10:21 | so first start with the for a bit explaining why we need to track one |
---|
0:10:29 | do what is going straight fine |
---|
0:10:32 | this is going to that of the dialogue system |
---|
0:10:35 | it can talk about restaurants |
---|
0:10:38 | the user said i am looking for a time restraint |
---|
0:10:42 | i and how very acoustically similar so there is very likely to be a misrecognition |
---|
0:10:50 | and we have high restaurant the fact that both |
---|
0:10:54 | no extra dialog state do that based on our culture so the ontology for our |
---|
0:11:01 | domain which was a restaurant or something else one and slot value pair |
---|
0:11:07 | i |
---|
0:11:08 | you hear that the system is sorry about the choice may but not so sorry |
---|
0:11:12 | about this the slot where would that the system asks request with or what kind |
---|
0:11:20 | of july |
---|
0:11:23 | i |
---|
0:11:24 | i which again gets misrecognized |
---|
0:11:27 | as i there is high |
---|
0:11:30 | and i don't do any ne extraction at this point this is mainly what happened |
---|
0:11:36 | before |
---|
0:11:38 | so |
---|
0:11:40 | the i have a very small |
---|
0:11:44 | and then system has no option but asking the same question again |
---|
0:11:48 | what kind of that you lack |
---|
0:11:51 | and this is what is particularly annoying to users asking the same question i get |
---|
0:11:58 | i know what happens if you tracking |
---|
0:12:01 | i don't really but in this pair |
---|
0:12:05 | do you remember that was annotated with time within the previous turn you know the |
---|
0:12:11 | probability of i based it is very low or overall probability they'll be actually higher |
---|
0:12:19 | and always the same as the third option which is fair |
---|
0:12:23 | this is not have the option of staying used a higher order fish |
---|
0:12:29 | it is much better action |
---|
0:12:34 | to be completely uncertainty free systems but the question is |
---|
0:12:38 | how do we managed it's |
---|
0:12:43 | i think about this is actually a very simple problem |
---|
0:12:47 | all you're doing is matching does over the concept that you have the ontology with |
---|
0:12:54 | the input because the user set register the user's flat side users that are |
---|
0:13:02 | problem is not simple because we all know it |
---|
0:13:05 | there is still many domains you can relate to a particular concept natural language |
---|
0:13:13 | and then what you have to do is build a belief tracker for each of |
---|
0:13:18 | these concepts at for |
---|
0:13:21 | and that is something which doesn't scale |
---|
0:13:23 | if you want to build an actual that exist |
---|
0:13:28 | so that the i-vector about scaling vocal tract |
---|
0:13:34 | note this solution to this problem is all to reuse knowledge you have for one |
---|
0:13:40 | one-step |
---|
0:13:41 | two four hundred constant |
---|
0:13:44 | because we cannot hope to have labeled data for every kind of concept you want |
---|
0:13:50 | a dialogue system to be |
---|
0:13:52 | and real humans are very widely known that new situations and they need very useful |
---|
0:13:59 | to do that |
---|
0:14:02 | so it is actually ingredients for a large scale tracking are semantically constrained we're vectors |
---|
0:14:10 | and are like to share parameters |
---|
0:14:15 | so that i explain what we need what we mean by semantically constrained word vectors |
---|
0:14:21 | more tolerant you have some close set |
---|
0:14:25 | was used for the main like restrooms with a process for slots |
---|
0:14:31 | like price range of values like chi question |
---|
0:14:36 | and this do you should know what is that is that's |
---|
0:14:40 | a very good here |
---|
0:14:42 | in america one |
---|
0:14:44 | are semantically similar |
---|
0:14:47 | it should to some extent but also make sure you don't know what kind of |
---|
0:14:54 | application a |
---|
0:14:56 | so for instance you can say here is that you have stated in his head |
---|
0:15:01 | of state in this case the queen or king are semantically still there but if |
---|
0:15:07 | you have a dialogue system you in the analysis user said you for something in |
---|
0:15:12 | the north wind looking for something you the set here |
---|
0:15:15 | well north and sentence error in this context really want to my this technique |
---|
0:15:21 | so what limitations the former phd students from are blue it used semantic |
---|
0:15:29 | a second understandings and |
---|
0:15:33 | synonyms to this is a vector space |
---|
0:15:38 | so it in a here what will change and x can be very far away |
---|
0:15:44 | but surely as a marking inexpensive will be close well |
---|
0:15:49 | in other stand for |
---|
0:15:50 | and |
---|
0:15:51 | and |
---|
0:15:54 | our |
---|
0:15:55 | g expensive a sector are concepts from the ontology and i'm sure that our debate |
---|
0:16:02 | is if the user may refer to be scores |
---|
0:16:07 | so we use this to scalar tracking |
---|
0:16:13 | you need to try and you have two times are typically three crash |
---|
0:16:18 | another question is |
---|
0:16:21 | that's what the system is saying |
---|
0:16:24 | e referring to what we have the ontology |
---|
0:16:28 | the second question is how what the user is a is there are three for |
---|
0:16:34 | what we have |
---|
0:16:37 | and your question is what is the onset |
---|
0:16:41 | but the context of this of the conversation well |
---|
0:16:46 | so i don't through the first question |
---|
0:16:49 | you use it is in fact how may i help you or anything else can |
---|
0:16:54 | be a vector embeddings region |
---|
0:16:57 | i feature extractor |
---|
0:17:00 | in here is to make this feature extractors you have |
---|
0:17:08 | so in our case we have to be treated as but this could be any |
---|
0:17:14 | kind of feature extractors like bidirectional |
---|
0:17:18 | and one would be for domain a generic one for the main a generic one |
---|
0:17:22 | for slot in a generic one guy |
---|
0:17:26 | what we have an ontology |
---|
0:17:29 | so we have and begging for restaurant name and price range by which e |
---|
0:17:34 | so then maybe |
---|
0:17:36 | actually we calculate the similarity between what our feature extractor for the main state |
---|
0:17:43 | what are your right |
---|
0:17:49 | the same process be the input that you got real user |
---|
0:17:56 | you actually needs to be and i'm into analysis and or an rnn or a |
---|
0:18:03 | cheer you anything each entry which hasn't requires it can how you keep track of |
---|
0:18:10 | on |
---|
0:18:12 | and then what you get |
---|
0:18:14 | probability for the k and then you that the same procedure probability for slot value |
---|
0:18:22 | and then when you're a five is to use a probability for the main or |
---|
0:18:27 | particular slot in particular that and then you do this for all |
---|
0:18:33 | and in your in your topology |
---|
0:18:37 | you the belief state |
---|
0:18:40 | and the current turn time |
---|
0:18:45 | so what we |
---|
0:18:47 | i is evaluated this |
---|
0:18:51 | this tracker but how can you invited belief tracking you need a touch these labels |
---|
0:18:58 | so in cambridge we have another works to create a be labeled datasets and u |
---|
0:19:07 | is in the wizard of all set |
---|
0:19:09 | so you have the i'm serious |
---|
0:19:13 | one |
---|
0:19:14 | who is represented representing the system so has access to the database and then not |
---|
0:19:20 | clear that he's representing the user and has access to do the |
---|
0:19:26 | task was provided to complete the user goal |
---|
0:19:30 | so the tools to each other and channel i would you in a text actually |
---|
0:19:38 | and |
---|
0:19:40 | also the states and part of the system and user i eight is what the |
---|
0:19:45 | user is setting so we get directly be a |
---|
0:19:51 | we have used actually that one is very small have |
---|
0:19:56 | one thousand two hundred dialogues with only one of them at a small number of |
---|
0:20:01 | slots and model with a small number |
---|
0:20:07 | recently collected a much larger a dataset |
---|
0:20:12 | which have almost a thousand dialogues across domains |
---|
0:20:17 | and the great thing here is that the means |
---|
0:20:20 | the change of the main is not only happened on the dialogue level but also |
---|
0:20:25 | on the turn |
---|
0:20:27 | it is much longer dialogues it's much more slots and that is |
---|
0:20:33 | so that it is where |
---|
0:20:35 | well we hear this model to a high dimensional be a neural belief tracker |
---|
0:20:43 | it was again developed by |
---|
0:20:46 | but which doesn't do this knowledge sharing between different don't be different colours |
---|
0:20:55 | and you very small on the smaller |
---|
0:21:00 | and i think i'll performed |
---|
0:21:03 | a mural belief tracker in every slot the user can be quite |
---|
0:21:09 | that no what's happening one new on the larger scale dataset |
---|
0:21:14 | no problem is a bit more complex because you or tracking domains and learn the |
---|
0:21:20 | neural five was not able to track the main things that also we compared to |
---|
0:21:26 | just the single lane |
---|
0:21:28 | and here outperforms the |
---|
0:21:32 | as well known looking at numbers for these are generally lower it shows that this |
---|
0:21:38 | will release date that original over which shows this dataset is much richer and more |
---|
0:21:45 | difficult to |
---|
0:21:49 | to track |
---|
0:21:52 | knowing full well as a set of things that have another class baseline but just |
---|
0:22:00 | to show you how difficult this task |
---|
0:22:02 | you or get only and percent accuracy where is then you knowledge sharing with nine |
---|
0:22:11 | three point two |
---|
0:22:14 | no this is the number of it my view is also ramadan to have a |
---|
0:22:18 | general i |
---|
0:22:20 | and if you're here next week for eight |
---|
0:22:23 | or someone will talk about |
---|
0:22:25 | this is more the |
---|
0:22:28 | i am going to move |
---|
0:22:30 | two variants |
---|
0:22:33 | dialogue policy |
---|
0:22:34 | one difference between v and policy optimisation |
---|
0:22:42 | o |
---|
0:22:43 | why dialogues are here |
---|
0:22:45 | and i'm at this point in dialogue |
---|
0:22:49 | tracking accumulate everything that happened so far in the dialogue which is important for coarse |
---|
0:22:55 | age |
---|
0:22:56 | i really tracking summarizes the past |
---|
0:23:00 | but what else policy to |
---|
0:23:02 | well there will always this point yes |
---|
0:23:06 | okay the action in such a count of these dialogue act |
---|
0:23:12 | bill be the best or when the user will be satisfied at the end of |
---|
0:23:17 | this time |
---|
0:23:19 | so the policy has to low future |
---|
0:23:24 | is the one that |
---|
0:23:27 | and what is the machine learning framework which allows us to perform live |
---|
0:23:33 | well that uses reinforcement learning |
---|
0:23:37 | reinforcement learning we have our dialogue system it is interacting with our user |
---|
0:23:43 | the system is taking actions |
---|
0:23:46 | and the user is responding results patients |
---|
0:23:50 | based on these observations we create the state |
---|
0:23:55 | the user is occasionally giving us the board |
---|
0:23:59 | no here and i say user may be real user controls maybe simulated user has |
---|
0:24:06 | really need to be to have i really exist |
---|
0:24:11 | notable is applied |
---|
0:24:14 | that's these states to actions |
---|
0:24:19 | and |
---|
0:24:20 | you want to find a policy it gives walter and user satisfaction |
---|
0:24:28 | so there exists |
---|
0:24:30 | once you |
---|
0:24:32 | remind you of some of the concepts in you know reinforcement learning that here are |
---|
0:24:37 | that we have |
---|
0:24:39 | so that and the most important in the concept of every tear |
---|
0:24:45 | so here at this point in that in the features that are going to and |
---|
0:24:49 | at this point the reader is the random variable which says what is the overall |
---|
0:24:55 | we were from this point that are |
---|
0:25:00 | no because it's a random variable |
---|
0:25:03 | maybe the estimate we can only estimate the next page |
---|
0:25:08 | and they expectation return starting from a particular believes eight is divided function |
---|
0:25:17 | and if we take the expectation start from a particular belief state updating a particular |
---|
0:25:22 | action it's q function |
---|
0:25:25 | estimating by the function q function or policy is equivalent if we find the optimal |
---|
0:25:32 | q function will also be able to find |
---|
0:25:36 | the optimal policy |
---|
0:25:39 | i reinforcement learning by function or q function or policy or approximate it is the |
---|
0:25:45 | network |
---|
0:25:47 | this is good because neural networks give us more here approximation |
---|
0:25:53 | which is preferred drug reinforcement learning was not of these functions are functions |
---|
0:25:59 | the automated it's the optimization over the years that's local optimal |
---|
0:26:05 | no probably the most famous people deep reinforcement learning algorithm using you network |
---|
0:26:12 | well as you network do |
---|
0:26:14 | i |
---|
0:26:15 | approximates q function as a neural network parameterized parameters |
---|
0:26:21 | and here we have a great in open lost |
---|
0:26:25 | which is the difference between what our parameter a parameterized function is a setting and |
---|
0:26:33 | maybe more your pain and what are |
---|
0:26:36 | there is |
---|
0:26:37 | one feature vector |
---|
0:26:41 | no problem to me is it is used as a biased estimates |
---|
0:26:48 | they are how that are correlated and targets are nonstationary |
---|
0:26:52 | which is all the reason why you is a very unstable algorithm it can often |
---|
0:26:59 | happen you can imagine give you good results |
---|
0:27:02 | that sometimes it doesn't work tool |
---|
0:27:06 | i think is all you want to optimize policy using a network |
---|
0:27:13 | i assume parametrization policy with parameters only |
---|
0:27:17 | and then what is greater here that's what the gradient of the object here want |
---|
0:27:23 | to maximize the by the initial state that is given by |
---|
0:27:28 | you only got |
---|
0:27:29 | and this is what |
---|
0:27:31 | what policy gradient is what it's |
---|
0:27:34 | why is it i don't have here to prove here |
---|
0:27:38 | but not just say it is directly used in reinforce algorithm also not complete |
---|
0:27:46 | however it is okay so that it is not is like the one from the |
---|
0:27:51 | un |
---|
0:27:52 | but has a very high variance which again is not something that |
---|
0:27:57 | three four |
---|
0:28:00 | you know |
---|
0:28:01 | it's also use the not clear creek to connect the search is going to give |
---|
0:28:06 | you a diagram of what an actor critic a cow are clearly frame looks like |
---|
0:28:12 | so this is our user this is our policy optimised there |
---|
0:28:19 | model that has to part one after this is actually out are all the steepest |
---|
0:28:23 | eighteen actions |
---|
0:28:25 | and i'm is critical that criticises this actor |
---|
0:28:29 | so make some action user wants be rewarded and belief state and then i think |
---|
0:28:37 | that we define how words |
---|
0:28:40 | our after what's |
---|
0:28:45 | that's a dialogue system does not apply these methods to dialogue systems like to modeling |
---|
0:28:54 | four or the policy of analysis we often find it takes too many iterations to |
---|
0:29:01 | train |
---|
0:29:02 | so we resort to using a summary space |
---|
0:29:07 | so what is me |
---|
0:29:09 | we can estimate of our state |
---|
0:29:12 | and i there is i |
---|
0:29:14 | it only choose you know |
---|
0:29:16 | and full of action |
---|
0:29:18 | and we had some heuristics which they you what this may be okay |
---|
0:29:25 | it uses that this actually belongs to |
---|
0:29:30 | a much larger master action space that hasn't toward are typically toward greater magnitude or |
---|
0:29:36 | actions in the summary space |
---|
0:29:40 | but this is obviously not good with |
---|
0:29:44 | i really want to build an actual conversation you want to buy a any kind |
---|
0:29:49 | of |
---|
0:29:50 | here is explicitly flights and choose between |
---|
0:29:55 | much richer actions |
---|
0:29:59 | so the problem is it's too many interaction need and |
---|
0:30:03 | this solution in this case is you experienced replay |
---|
0:30:09 | and i don't know i'll |
---|
0:30:13 | however this produces a much larger |
---|
0:30:17 | allows us to learn a much larger space |
---|
0:30:22 | so it is algorithm which is called a server it's and i critic algorithm uses |
---|
0:30:28 | it is played |
---|
0:30:29 | e s q function off policy |
---|
0:30:32 | and uses be raised to compute the heart skipped also uses trust region policy h |
---|
0:30:39 | so that it just briefly go through these point |
---|
0:30:43 | now more experienced reply |
---|
0:30:48 | have interaction with your dialogue system you're generating something that is cool |
---|
0:30:54 | now in order to maximize the value of you at a |
---|
0:30:58 | it's not times you can also go through that they and we played experi |
---|
0:31:04 | no it is that a point not the system has learned something and all its |
---|
0:31:10 | on actions are not particularly good so we should be exactly the same reward |
---|
0:31:19 | there for you importance sampling ratios to this |
---|
0:31:26 | it's a piece |
---|
0:31:27 | is that they have it was generated in principle is not what we have right |
---|
0:31:33 | now |
---|
0:31:34 | and how we're |
---|
0:31:36 | our gradient |
---|
0:31:40 | well it is important issues |
---|
0:31:44 | now if you that's for q function |
---|
0:31:47 | do you will inevitably have to four |
---|
0:31:50 | the whole trajectory to model keep it is important sounding ratio |
---|
0:31:56 | multiply small number x |
---|
0:32:00 | they're the irish |
---|
0:32:03 | or if you marked by very much better |
---|
0:32:06 | a explode |
---|
0:32:08 | and this is funny truncate the importance of a |
---|
0:32:13 | and also add bias correction utterance just to acknowledge that you're actually making |
---|
0:32:21 | it is what's |
---|
0:32:23 | retrace algorithm allows us to do |
---|
0:32:26 | remind |
---|
0:32:28 | we want to use actor critic framework so we want to estimate for policy and |
---|
0:32:33 | q function |
---|
0:32:34 | resulted you you'll and providing biased estimates for q function |
---|
0:32:39 | so in that it one for hardly hear |
---|
0:32:43 | for our for our lost for q |
---|
0:32:46 | and given by retraced all agree |
---|
0:32:51 | you and we |
---|
0:32:53 | and when you as work from the one on why this provides one of is |
---|
0:32:59 | that it small area but i just give you case why is you don't have |
---|
0:33:04 | this and school clay |
---|
0:33:06 | the thing is merely multiplying our |
---|
0:33:10 | importance sampling rate issue |
---|
0:33:12 | but we are trying to say that |
---|
0:33:14 | so it is that they don |
---|
0:33:17 | they don't vanish |
---|
0:33:19 | and if you know these errors here |
---|
0:33:23 | you know what we had in our in our |
---|
0:33:28 | you |
---|
0:33:29 | but there is no |
---|
0:33:31 | right here which we shall with this |
---|
0:33:35 | this is employed |
---|
0:33:36 | is not |
---|
0:33:38 | by s |
---|
0:33:41 | and then manually think that we do is a trust region policy optimisation |
---|
0:33:47 | now the problem is that there are all i think probably steve directly in a |
---|
0:33:53 | reinforcement learning framework in the be proportional planning framework and small changes in parameter space |
---|
0:33:59 | can result in very large an unexpected changes the policy |
---|
0:34:04 | this solution is to use natural gradient but it is expensive to compute the natural |
---|
0:34:11 | gradient gives you the direction of the speakers this |
---|
0:34:14 | but |
---|
0:34:15 | it is natural gradient can be approximated as kl divergence between the policies of all |
---|
0:34:24 | subsequent parameter |
---|
0:34:26 | we have here and then the transmission policy optimization expert or approximate that kl divergence |
---|
0:34:33 | with the first order taylor expansion so that is to see between subsequent all densities |
---|
0:34:42 | small so that you don't have i mean i |
---|
0:34:44 | here you know how policy this is particularly important if you want to their be |
---|
0:34:50 | a promising interaction is really going one to a really |
---|
0:34:56 | afford to say i'm expected |
---|
0:35:01 | so no and we want it is to a to a dialogue system that one |
---|
0:35:06 | are directly in master space |
---|
0:35:10 | we have to have adequate architecture |
---|
0:35:14 | all the neural network |
---|
0:35:18 | no i |
---|
0:35:20 | a critical mass that the set so we are making the point and q function |
---|
0:35:24 | at the same time |
---|
0:35:26 | and in order to make the most of it you share a feature extractor apart |
---|
0:35:32 | from our belief state |
---|
0:35:35 | and that we want to learn a master space we have to choose between very |
---|
0:35:40 | maybe |
---|
0:35:41 | so that it will for policy and the q function |
---|
0:35:44 | will have a part just using the summary action or if you think that this |
---|
0:35:48 | is the dialogue |
---|
0:35:50 | and a part |
---|
0:35:51 | choosing which slot should complement this that go after this |
---|
0:35:57 | and then we have a greater for policy which is given by just three policy |
---|
0:36:03 | optimisation and the gradient for the |
---|
0:36:06 | would you function which is given by a cell |
---|
0:36:11 | okay so how does this work and you know datasets we apply this in the |
---|
0:36:16 | cambridge restaurant domain |
---|
0:36:19 | we have a really very large belief state |
---|
0:36:22 | hence foundry actually is very large number master actions and me about it is simply |
---|
0:36:29 | using its operating on my lap |
---|
0:36:34 | and here are the results |
---|
0:36:36 | so that system showing training |
---|
0:36:40 | the y-axis is showing success rate so we'll |
---|
0:36:43 | that would be |
---|
0:36:45 | successfully completed or not |
---|
0:36:47 | and all this model is learned and mastered state at all |
---|
0:36:54 | and the other learning one summary actions |
---|
0:36:58 | do you hear its on the policy is expected policy this learning the summary space |
---|
0:37:04 | ease faster because the parents between much smaller number of actions |
---|
0:37:08 | but actually menu |
---|
0:37:11 | it mister actually |
---|
0:37:12 | space has to wonder why don't you or actions it actually only twice this little |
---|
0:37:19 | so this is good news |
---|
0:37:21 | so as it were actually has these policies interaction with real users amazon mechanical turk |
---|
0:37:28 | we use |
---|
0:37:31 | and see that the performance in terms of success rate |
---|
0:37:35 | are almost the same |
---|
0:37:37 | but actually master actions case policy or position |
---|
0:37:43 | this is the right is to gather why we have a regional in a house |
---|
0:37:49 | to it and it has just been accepted i transactions with speech and language |
---|
0:37:55 | only speech and language |
---|
0:37:59 | okay so one thing |
---|
0:38:02 | which you probably her about a |
---|
0:38:08 | i hear that a my student for basic talk about |
---|
0:38:13 | it would be addressed |
---|
0:38:15 | the problem of having the user models |
---|
0:38:20 | so when we optimize dialogue management you need to actually the simulated user we often |
---|
0:38:27 | find we assume you can use a simulated users that are hand-coded |
---|
0:38:33 | or not very realistic as the ones we have role |
---|
0:38:37 | as |
---|
0:38:38 | interaction if you have you have users |
---|
0:38:41 | this solution here is to train |
---|
0:38:44 | user model in an end-to-end fashion |
---|
0:38:47 | and you have outcome is potentially have more natural |
---|
0:38:50 | a conversation this simulated use |
---|
0:38:55 | signal |
---|
0:38:56 | what you will stimulate users and you simulated user consists of three |
---|
0:39:03 | the first part of the goal generator and you can think about this is a |
---|
0:39:07 | random generator to generate |
---|
0:39:09 | what goals that the dialogue |
---|
0:39:11 | at the other real user can't hack |
---|
0:39:15 | and this and |
---|
0:39:16 | is the feature extractor so in the feature extractor |
---|
0:39:20 | extract features from what is in this state |
---|
0:39:24 | that relates to what users well it's |
---|
0:39:28 | and then you have to see this is because you will never do not features |
---|
0:39:33 | history to the user utterance |
---|
0:39:37 | so here it is how it works so |
---|
0:39:42 | i some features so for instance if the system that i'm sorry there is no |
---|
0:39:47 | such i-th turn and it would be speech is what else out that |
---|
0:39:53 | a whole like now with what the user will and this user goal can then |
---|
0:39:58 | potentially change |
---|
0:40:00 | we have and un |
---|
0:40:03 | a human life story off |
---|
0:40:05 | this feature |
---|
0:40:07 | in that it layer |
---|
0:40:10 | and the sequence the sequence would |
---|
0:40:15 | we each we start with the start of sentences |
---|
0:40:19 | and a with the word that the simulated user in spring you see |
---|
0:40:27 | so we should s simulator on his easy to see that because they sit there |
---|
0:40:35 | real users to without systems so you want to model how |
---|
0:40:42 | how the real users are correctly |
---|
0:40:47 | and i simulated user in a fly unorthodox way so |
---|
0:40:55 | but there is not so interest in into how well we were interested how the |
---|
0:41:00 | user simulator ones sentences but also how can help us in training |
---|
0:41:07 | a dialogue system |
---|
0:41:09 | so which well five for each user simulator |
---|
0:41:16 | we |
---|
0:41:17 | but our |
---|
0:41:21 | in which we train policies |
---|
0:41:23 | so what one |
---|
0:41:27 | policies were trained with neutral user simulator which is completely statistical and another one be |
---|
0:41:33 | the agenda-based user simulator which is having which is based on rules |
---|
0:41:40 | and for which a user stimulate the best performing policy on the other on the |
---|
0:41:47 | other user simulator so that will become probably |
---|
0:41:51 | ally clear in the next line |
---|
0:41:55 | and then be and why these policies i'll to interact with users on the camcorder |
---|
0:42:04 | so |
---|
0:42:06 | a user simulator training |
---|
0:42:09 | for all that is used for policy training so one neural user simulator another one |
---|
0:42:15 | is |
---|
0:42:16 | okay |
---|
0:42:18 | and that we know how well a policy performing on neural user simulator |
---|
0:42:25 | and what the best performing policy that's train a user simulator |
---|
0:42:31 | and performing well on agenda-based |
---|
0:42:36 | similarly for the agenda-based |
---|
0:42:40 | no i mean what is the results show that your policy on agenda-based user separately |
---|
0:42:46 | and then performing really agenda-based user simulator it's not going to one particular well on |
---|
0:42:54 | real users |
---|
0:42:56 | so rule based |
---|
0:42:59 | approach is to build a user simulator is not particularly |
---|
0:43:04 | but you always knew real user simulator and if you are wanting rate real users |
---|
0:43:10 | a vector |
---|
0:43:13 | we use |
---|
0:43:15 | but the best performing you want to actually train neural user simulator and best performing |
---|
0:43:22 | the agenda-based user c |
---|
0:43:25 | each is that it's the learning is promising for modeling use |
---|
0:43:32 | but see i with the h in it or you hear if we want the |
---|
0:43:40 | best |
---|
0:43:41 | performance |
---|
0:43:44 | okay so well i lost five minutes i would like to talk about something that |
---|
0:43:50 | probably closer to this all |
---|
0:43:53 | in our community |
---|
0:43:56 | how do we effectively |
---|
0:43:59 | evaluate dialogue models |
---|
0:44:01 | and how do we compare however |
---|
0:44:04 | how can we print use good style |
---|
0:44:08 | similarly was pcs here only a handful of loops around the world had access to |
---|
0:44:15 | or at a time axis |
---|
0:44:17 | and this is something that you want to change in cambridge |
---|
0:44:23 | because you we want kicks not really and also allow people to easily compared to |
---|
0:44:29 | each other |
---|
0:44:30 | so we that mine are toolkit for building the house a statistical dialogue systems open |
---|
0:44:37 | source |
---|
0:44:38 | well i |
---|
0:44:40 | if we use a i'm not sure a simulated environments |
---|
0:44:46 | i algorithms |
---|
0:44:47 | it can compare so you want to test the new |
---|
0:44:52 | a new policy |
---|
0:44:54 | you can more easily compared to the two sets the state-of-the-art |
---|
0:44:59 | the collected a large corpus that i've just described |
---|
0:45:05 | in school monte was we are making this open access |
---|
0:45:09 | and this work was funded a by my faculty of four |
---|
0:45:15 | so just a few works the bar titled i'll |
---|
0:45:19 | so i know where is a implementations of statistical approaches to dialogue system |
---|
0:45:26 | and it's more similar so you can vary you see a exchange your module for |
---|
0:45:32 | the currently available functional in the two q |
---|
0:45:36 | it can very easily be extended each other much closer domain and as if you |
---|
0:45:43 | a four hundred dollars you would use it to your dialogue system |
---|
0:45:49 | it offers not domain conversational function |
---|
0:45:55 | and you the coherent also subscribe to our at this |
---|
0:46:02 | and this was reading that words from or not just the card numbers but also |
---|
0:46:06 | from the previous members a off the have systems group |
---|
0:46:12 | and he's constantly expand |
---|
0:46:17 | so in terms of benchmarking |
---|
0:46:19 | you want to have a way of comparing algorithms in a fair way |
---|
0:46:25 | so for freebase we define the main is different user settings and also different noise |
---|
0:46:32 | levels in the user input so at |
---|
0:46:36 | by the total |
---|
0:46:38 | and |
---|
0:46:39 | state and number of state-of-the-art parser optimization algorithms including the acre a brief digest of |
---|
0:46:46 | about |
---|
0:46:48 | so this initially it was let me because in the ema and probably chernotsky and |
---|
0:46:54 | with present it needs symposium on the last year |
---|
0:47:00 | so it's basically you the end of my talk |
---|
0:47:07 | it's i mean you it's machine learning |
---|
0:47:10 | allows us to solve any problems in on the rear facing in building natural conversations |
---|
0:47:19 | you are you married it allows us share concepts between really tracking and in so |
---|
0:47:26 | that we can have |
---|
0:47:29 | so that we can |
---|
0:47:30 | the operational system |
---|
0:47:33 | in the same they come up with us to know to build a more always |
---|
0:47:38 | the optimization modules use between a wide variety of a only of action |
---|
0:47:45 | and also allows us to build more realistic models of users |
---|
0:47:50 | so that we can train more accurate |
---|
0:47:53 | policies |
---|
0:47:54 | but there is a lot to be time to actually achieve a goal of an |
---|
0:48:01 | actual conversation |
---|
0:48:02 | and this is just the input the high score |
---|
0:48:05 | so some of the i'm years are how we want to talk about |
---|
0:48:11 | i'm structure they can we need to their a knowledge base |
---|
0:48:16 | if you want system for very long conversation we need more accurate |
---|
0:48:24 | and more sophisticated reinforcement learning more models |
---|
0:48:29 | and finally we need to achieve sentiment various have more nuanced |
---|
0:48:34 | we weren't fun function |
---|
0:48:37 | to take into account when we are building |
---|
0:48:40 | i don't have exist |
---|
0:48:43 | so that's can will bring us closer to the a long term vision which is |
---|
0:48:47 | have a natural conversation goal directed that |
---|
0:48:51 | if you very |
---|
0:49:29 | so would be compared with the so there is a statistical version of the agenda-based |
---|
0:49:34 | imitate their |
---|
0:49:36 | but you |
---|
0:49:38 | realise on hands on having the structure |
---|
0:49:42 | or all the conversation in this free that you first |
---|
0:49:48 | asks some so some parts of it are hand-coded and then it has |
---|
0:49:53 | pockets which are trained so this is done on it |
---|
0:49:57 | the overall problem solving the overall problem of natural conversation would not be applicable because |
---|
0:50:04 | we still have |
---|
0:50:05 | that structure which is fixed so we have compared to that but actually this neural |
---|
0:50:11 | stimulator was trained on very small amount of data so |
---|
0:50:15 | i don't know if i have exact numbers dstc two is only i think one |
---|
0:50:19 | thousand dialogues |
---|
0:50:21 | so that's it's not a lot |
---|
0:50:24 | no because of that didn't do not parameters were kept really small so for instance |
---|
0:50:30 | if i go back |
---|
0:50:34 | so we don't actually have in which the user the system |
---|
0:50:39 | here we have in what's the semantic form of the of the user senses |
---|
0:50:44 | so then this feature extractor is a fact i is very easy to build |
---|
0:50:51 | i otherwise you would need a cnn or something more sophisticated here so that it |
---|
0:50:58 | is and it would expand the number of |
---|
0:51:01 | also how many |
---|
0:51:04 | how we uk these vectors to be useful then implies how many parameters you have |
---|
0:51:10 | analysis |
---|
0:51:11 | so in this model everything this cat very small |
---|
0:51:15 | just to account for the fact that you have a very small amount |
---|
0:51:27 | so |
---|
0:51:40 | so a lot |
---|
0:51:43 | we carry you mean if you want to start from scratch or if you want |
---|
0:51:47 | to use some of the models |
---|
0:51:51 | do you want to start from scratch |
---|
0:51:53 | then basically everything |
---|
0:51:56 | everything is domainindependent in that sense so |
---|
0:52:00 | in particular belief tracking |
---|
0:52:04 | there |
---|
0:52:11 | so maybe tracking takes input |
---|
0:52:15 | and the ontology |
---|
0:52:16 | so this is very this is just and the additional inputs to the belief tracker |
---|
0:52:21 | and you in back |
---|
0:52:23 | the word vectors with you have been your ontology to begin with |
---|
0:52:29 | so traditionally you want okay and whether it appears in the user send |
---|
0:52:35 | here we take where the vector of that |
---|
0:52:39 | and compare the similarity of that word vector with our feature extraction |
---|
0:52:44 | and we have |
---|
0:52:45 | three or generic feature extractor which is the main slot and value |
---|
0:52:49 | so |
---|
0:52:50 | so there is crazy this should were as it is to a different domain |
---|
0:53:09 | right |
---|
0:53:10 | so in an accidental is that there is a more difficult problem in that sense |
---|
0:53:16 | so you would need to redefine and then system i six |
---|
0:53:21 | in forty two forty two work |
---|
0:53:25 | i |
---|
0:53:27 | and then |
---|
0:53:29 | it will |
---|
0:53:31 | however knowledge base looks like so whether |
---|
0:53:35 | maybe you embed the where it's already been back maybe a particular constraint |
---|
0:53:40 | so |
---|
0:53:41 | i read in that the two |
---|
0:54:27 | so that works very unhappy one stage of not requiring |
---|
0:54:32 | label they tell the intermediate step |
---|
0:54:35 | and that is a huge amount which because if you're generating millions of coals every |
---|
0:54:41 | week |
---|
0:54:42 | you don't have asked to twenty eight |
---|
0:54:45 | so it certainly work investigated process inside because of batteries |
---|
0:54:50 | but the downside that actually work |
---|
0:54:54 | is about |
---|
0:54:55 | and the reason for it is that it is still not able to figure out |
---|
0:55:00 | how to train these networks do not require additional separation |
---|
0:55:06 | so a lot of their own that is |
---|
0:55:09 | along that line goes stable about |
---|
0:55:11 | having and tool and differentiable neural network that you can propagate gradients true but you |
---|
0:55:18 | still need at some kids the supervision to allow you to actually have |
---|
0:55:24 | a meaningful output |
---|
0:55:29 | and i'm not a problem is the evaluation obsessed is so |
---|
0:55:33 | the research in this area have six hundred and many people or not originally from |
---|
0:55:41 | dialogue are doing research in this area and they take this is a translation possibly |
---|
0:55:46 | have |
---|
0:55:47 | system input in user out and this is really not the case yukon |
---|
0:55:51 | and why respect to be the bleu score that doesn't say anything about |
---|
0:55:56 | the quality of these dialogue and it doesn't take into account the fact that you |
---|
0:56:00 | can have a long-term conversation |
---|
0:56:08 | three |
---|
0:56:10 | yes |
---|
0:56:26 | right |
---|
0:56:40 | you raise or so that |
---|
0:56:44 | say |
---|
0:56:47 | so that there would be i mean |
---|
0:56:49 | so the one in speech recognition have looked at this problem of having to iterate |
---|
0:56:54 | over huge number of all works for instance in our in language modeling and there |
---|
0:57:00 | are some cases like using things can on contrastive estimation do not have to do |
---|
0:57:06 | a softmax but rather have normalized |
---|
0:57:09 | output so that |
---|
0:57:10 | it's one thing with for this work we need to have some similarity metric between |
---|
0:57:16 | and some confusability between different |
---|
0:57:20 | different elements of the ontology |
---|
0:57:23 | so i don't know whether i have a we can answer to how to actually |
---|
0:57:26 | do that |
---|
0:57:28 | value |
---|
0:57:29 | we have as a whole the ontology and then |
---|
0:57:35 | sponded |
---|
0:57:36 | for a non backed |
---|
0:57:40 | having |
---|
0:57:43 | because all you actually want to have is a good |
---|
0:57:47 | is this would space representation |
---|
0:57:51 | so you can almost |
---|
0:57:53 | you can almost i'm from it and then noted here that would be a particular |
---|
0:57:57 | work but that's very difficult to okay |
---|
0:58:01 | so i think some interesting problem |
---|
0:58:20 | it sometimes a really difficult then we actually have addressed this in this work |
---|
0:58:25 | so |
---|
0:58:28 | that |
---|
0:58:29 | that doesn't |
---|
0:58:30 | produced a good and bad |
---|
0:59:36 | and in here is used |
---|
0:59:40 | you use it in the sense that you know it is that okay consists or |
---|
0:59:45 | something to do this slot summary action is something to do with slots |
---|
0:59:50 | that's you know i |
---|
0:59:51 | how many slots you will talk about |
---|
0:59:55 | you like the system learn a to do that |
---|
0:59:59 | so you know especially if you don't have enough training data you can always equal |
---|
1:00:04 | rate at the system but once we were interested here was mostly to see whether |
---|
1:00:11 | we can be you because |
---|
1:00:13 | if you look at the reinforcement learning tricks which are really a use a reinforcement |
---|
1:00:21 | learning for problems which can be da five simulated and it's often discrete space it's |
---|
1:00:29 | it's the setting of joystick what to what the action space a to choose between |
---|
1:00:33 | a very small number of actions |
---|
1:00:36 | and if you want to apply the time period sticks without seriously you will inevitably |
---|
1:00:41 | have to learn a larger |
---|
1:00:44 | state action spaces this is really what we were interested in here but obviously you |
---|
1:00:50 | always equal rate |
---|
1:00:53 | which you just |
---|
1:00:55 | described |
---|