0:00:14only one source statistical
0:00:19and like to start the third and final invited talk
0:00:24so we decided to use to actually you know
0:00:31calculated from computer science and mathematics from the university l right
0:00:36and since then she's been at cambridge university course you received your m fill that
0:00:41her phd in statistical dialogue systems research associate and most recently has become a lecture
0:00:48it's open dialogue systems in the department of engineering
0:00:52and she is also a well like to the fellow of one of the colleges
0:00:55the cambridge university
0:00:59she's extremely well known i'm short everyone in this community because she's very well published
0:01:03including a number
0:01:04a for winning papers including classic style and she's coauthor of one of the nominees
0:01:11of our for nominated papers at this six dial and have
0:01:15after her talk if you still wanna do you can even more into her and
0:01:20her colleagues research they have to posters at the afternoon poster session this afternoon
0:01:26please welcome relief
0:01:36and everybody here to sling
0:01:40thank you don't really comes address is there was lining of getting the ski boats
0:01:45here right
0:01:46i
0:01:48once in their sick not really clears one big family and if a family member
0:01:53to do something kind of signal
0:01:55so
0:01:57thank you very much
0:01:58a i will be talking about
0:02:02soundness there are needed
0:02:04that's what we're a building next conversation
0:02:08and a deep learning can help us along that they are in some effort that
0:02:13we've done between the dialysis this group in cambridge to achieve that
0:02:20while i'm sure that we all agree
0:02:23spoken conversation and in particular dialogue is one of the most natural rates of exchanging
0:02:30information between q
0:02:32we can be a book and be able to talk about what we just right
0:02:39machines of the other hand there are very scoring huge amount of information okay not
0:02:45so good share this information bit as in actual in human like right
0:02:50so i'm sure and get lots of companies will have the virtual personal assistant sorry
0:02:56privately locked loop and how they generate billions of calls
0:03:01then the current models are very unnatural
0:03:05no in domain and frustrating users
0:03:09so in the research question that one to address is
0:03:12how to be a continuous labeling
0:03:16dialogue system capable of natural conversation
0:03:23machine learning is very attractive for solving this task
0:03:28one of the machine learning very high if i had to summarize machine learning the
0:03:33just three words this would be data
0:03:35model and prediction
0:03:38so what they are in our case
0:03:41okay is simply driver
0:03:43or some parts of dialogues like
0:03:46transcribed speech on a okay user intents what providing user feedback
0:03:55the model is the underlying statistical model that lets us explain a time we use
0:04:01of i've never directly model
0:04:05once we train the model
0:04:07we can make predictions
0:04:09what is unusable
0:04:11what to say back
0:04:13to the user
0:04:17that was
0:04:18you just the building statistical dialogue systems has some three d so you assume this
0:04:25the following structure
0:04:28i guess is called dialog system consists of speech understanding unit
0:04:32no management unit
0:04:34and speech generation you
0:04:37but it user speaks their speech is being recognized very speech recognizer
0:04:43and a system a coherent state tracker that produce
0:04:47dialog states it's of the
0:04:50that is currently
0:04:53these a policy makes a decision what to say back to the user
0:04:59and very often are more or less nature of some kind of evaluated which vibrates
0:05:06how good base decision well
0:05:09second experiment we generate your
0:05:12which reduces the textual output that is then presented to the user like text-to-speech synthesizer
0:05:20i don't mind all these model of modules
0:05:24is the ontology structured representation of the database that the dialogue system can talk so
0:05:31this is
0:05:32the structured it's obvious you
0:05:35in goal oriented that exist
0:05:38that is not to wear in the last
0:05:42okay on automatic speech recognizers use size
0:05:46some researchers go as far as saying that are known to reach the performance of
0:05:52organs you want
0:05:54in a speech recognition
0:05:56i wouldn't say that but i would just like to point out that queries you
0:06:01want more to be done
0:06:03in the rest of the structure
0:06:05of a spoken dialogue system
0:06:08no this modular structure
0:06:12there is no loss of information between these modules
0:06:17and then the question is what can you to alleviate this loss of information
0:06:23what can you want you
0:06:26actually i
0:06:27probability distributions between these modules
0:06:31to our help alleviate the problem of loss of information
0:06:35and the other when you do that
0:06:38you the that's the map decoder and the state tracking becomes the belief tracker module
0:06:46it produces a distribution over possible
0:06:49a dialog states are described in
0:06:54but before i go further into explaining how these systems where it's just take a
0:07:01step back and see how many systems are currently prissy by its you are there
0:07:07is
0:07:12i have a personal assistant or microphone and a user pretty much everything i think
0:07:17at what times map like tomorrow and it says you're like to note is that
0:07:22there it is a p a shown with importance of doing a key and it
0:07:27says
0:07:28well as to not be a what time is tim's waiting to more you're waiting
0:07:33is it five you know into a knocking tim's wearing that is you weighting is
0:07:37the type ea in okay never mind we communicate by data alarms that four five
0:07:42am
0:07:43see it takes care of pretty much everything that i could probably due my so
0:07:47but just kind of or for having someone to forming it did not however prevent
0:07:51me from spilling coffee all over my trousers just before the ceremony i set up
0:07:56at a i've got a problem and everything up
0:07:58you're waiting using one hour and twenty five minutes james wedding i've got coffee all
0:08:04over my trousers but has there is a nineteen ninety three stop okay intermediate for
0:08:08l d a r any department stores double your favourite department store is no but
0:08:13i don't think there do not p a find me a shot into a not
0:08:18sure that
0:08:19okay
0:08:21do any stores after answers that's interesting question okay "'cause" call me a taxi to
0:08:26the centre i'm sorry i can help axes into an okay
0:08:30what you mean you can call it actually i'm getting a no i'm sorry i
0:08:33cannot help you with that and so how am i supposed to get to the
0:08:36weighting you're waiting using one hour and twenty for the last time came sweating your
0:08:41wedding using one our work and twenty four minutes
0:08:44well thanks be a you're really helping me out it is a should be thinking
0:08:49you
0:08:52so it should just a and resembles the actual personal assistants real virtual is really
0:08:57going through that
0:08:59okay so we'll
0:09:01so let's try to problems that we address
0:09:05the most obvious one from this means here is obviously phonemic awareness this is a
0:09:11personal assistant was completely unaware of the user emotion and their state but there are
0:09:18some things we need to address before that
0:09:21so that the problems is closed it's
0:09:25sure that can still not scale
0:09:27and often maybe tool for long time to dialogue system it struck context
0:09:37this problem is that each voice all
0:09:41action of response is not pretty good
0:09:45and the reason for that is the learner response choose between the very small set
0:09:51of actions
0:09:52and think to build an actual conversation unless thinking a lot of our systems to
0:09:58choose between of a wide variety of actions
0:10:04and finally systems their own or stick to different user needs
0:10:08and this can be interpreted in many different raise but it is clear that we
0:10:13need to more the user back there
0:10:15if we want to achieve a better dialogue system
0:10:21so first start with the for a bit explaining why we need to track one
0:10:29do what is going straight fine
0:10:32this is going to that of the dialogue system
0:10:35it can talk about restaurants
0:10:38the user said i am looking for a time restraint
0:10:42i and how very acoustically similar so there is very likely to be a misrecognition
0:10:50and we have high restaurant the fact that both
0:10:54no extra dialog state do that based on our culture so the ontology for our
0:11:01domain which was a restaurant or something else one and slot value pair
0:11:07i
0:11:08you hear that the system is sorry about the choice may but not so sorry
0:11:12about this the slot where would that the system asks request with or what kind
0:11:20of july
0:11:23i
0:11:24i which again gets misrecognized
0:11:27as i there is high
0:11:30and i don't do any ne extraction at this point this is mainly what happened
0:11:36before
0:11:38so
0:11:40the i have a very small
0:11:44and then system has no option but asking the same question again
0:11:48what kind of that you lack
0:11:51and this is what is particularly annoying to users asking the same question i get
0:11:58i know what happens if you tracking
0:12:01i don't really but in this pair
0:12:05do you remember that was annotated with time within the previous turn you know the
0:12:11probability of i based it is very low or overall probability they'll be actually higher
0:12:19and always the same as the third option which is fair
0:12:23this is not have the option of staying used a higher order fish
0:12:29it is much better action
0:12:34to be completely uncertainty free systems but the question is
0:12:38how do we managed it's
0:12:43i think about this is actually a very simple problem
0:12:47all you're doing is matching does over the concept that you have the ontology with
0:12:54the input because the user set register the user's flat side users that are
0:13:02problem is not simple because we all know it
0:13:05there is still many domains you can relate to a particular concept natural language
0:13:13and then what you have to do is build a belief tracker for each of
0:13:18these concepts at for
0:13:21and that is something which doesn't scale
0:13:23if you want to build an actual that exist
0:13:28so that the i-vector about scaling vocal tract
0:13:34note this solution to this problem is all to reuse knowledge you have for one
0:13:40one-step
0:13:41two four hundred constant
0:13:44because we cannot hope to have labeled data for every kind of concept you want
0:13:50a dialogue system to be
0:13:52and real humans are very widely known that new situations and they need very useful
0:13:59to do that
0:14:02so it is actually ingredients for a large scale tracking are semantically constrained we're vectors
0:14:10and are like to share parameters
0:14:15so that i explain what we need what we mean by semantically constrained word vectors
0:14:21more tolerant you have some close set
0:14:25was used for the main like restrooms with a process for slots
0:14:31like price range of values like chi question
0:14:36and this do you should know what is that is that's
0:14:40a very good here
0:14:42in america one
0:14:44are semantically similar
0:14:47it should to some extent but also make sure you don't know what kind of
0:14:54application a
0:14:56so for instance you can say here is that you have stated in his head
0:15:01of state in this case the queen or king are semantically still there but if
0:15:07you have a dialogue system you in the analysis user said you for something in
0:15:12the north wind looking for something you the set here
0:15:15well north and sentence error in this context really want to my this technique
0:15:21so what limitations the former phd students from are blue it used semantic
0:15:29a second understandings and
0:15:33synonyms to this is a vector space
0:15:38so it in a here what will change and x can be very far away
0:15:44but surely as a marking inexpensive will be close well
0:15:49in other stand for
0:15:50and
0:15:51and
0:15:54our
0:15:55g expensive a sector are concepts from the ontology and i'm sure that our debate
0:16:02is if the user may refer to be scores
0:16:07so we use this to scalar tracking
0:16:13you need to try and you have two times are typically three crash
0:16:18another question is
0:16:21that's what the system is saying
0:16:24e referring to what we have the ontology
0:16:28the second question is how what the user is a is there are three for
0:16:34what we have
0:16:37and your question is what is the onset
0:16:41but the context of this of the conversation well
0:16:46so i don't through the first question
0:16:49you use it is in fact how may i help you or anything else can
0:16:54be a vector embeddings region
0:16:57i feature extractor
0:17:00in here is to make this feature extractors you have
0:17:08so in our case we have to be treated as but this could be any
0:17:14kind of feature extractors like bidirectional
0:17:18and one would be for domain a generic one for the main a generic one
0:17:22for slot in a generic one guy
0:17:26what we have an ontology
0:17:29so we have and begging for restaurant name and price range by which e
0:17:34so then maybe
0:17:36actually we calculate the similarity between what our feature extractor for the main state
0:17:43what are your right
0:17:49the same process be the input that you got real user
0:17:56you actually needs to be and i'm into analysis and or an rnn or a
0:18:03cheer you anything each entry which hasn't requires it can how you keep track of
0:18:10on
0:18:12and then what you get
0:18:14probability for the k and then you that the same procedure probability for slot value
0:18:22and then when you're a five is to use a probability for the main or
0:18:27particular slot in particular that and then you do this for all
0:18:33and in your in your topology
0:18:37you the belief state
0:18:40and the current turn time
0:18:45so what we
0:18:47i is evaluated this
0:18:51this tracker but how can you invited belief tracking you need a touch these labels
0:18:58so in cambridge we have another works to create a be labeled datasets and u
0:19:07is in the wizard of all set
0:19:09so you have the i'm serious
0:19:13one
0:19:14who is represented representing the system so has access to the database and then not
0:19:20clear that he's representing the user and has access to do the
0:19:26task was provided to complete the user goal
0:19:30so the tools to each other and channel i would you in a text actually
0:19:38and
0:19:40also the states and part of the system and user i eight is what the
0:19:45user is setting so we get directly be a
0:19:51we have used actually that one is very small have
0:19:56one thousand two hundred dialogues with only one of them at a small number of
0:20:01slots and model with a small number
0:20:07recently collected a much larger a dataset
0:20:12which have almost a thousand dialogues across domains
0:20:17and the great thing here is that the means
0:20:20the change of the main is not only happened on the dialogue level but also
0:20:25on the turn
0:20:27it is much longer dialogues it's much more slots and that is
0:20:33so that it is where
0:20:35well we hear this model to a high dimensional be a neural belief tracker
0:20:43it was again developed by
0:20:46but which doesn't do this knowledge sharing between different don't be different colours
0:20:55and you very small on the smaller
0:21:00and i think i'll performed
0:21:03a mural belief tracker in every slot the user can be quite
0:21:09that no what's happening one new on the larger scale dataset
0:21:14no problem is a bit more complex because you or tracking domains and learn the
0:21:20neural five was not able to track the main things that also we compared to
0:21:26just the single lane
0:21:28and here outperforms the
0:21:32as well known looking at numbers for these are generally lower it shows that this
0:21:38will release date that original over which shows this dataset is much richer and more
0:21:45difficult to
0:21:49to track
0:21:52knowing full well as a set of things that have another class baseline but just
0:22:00to show you how difficult this task
0:22:02you or get only and percent accuracy where is then you knowledge sharing with nine
0:22:11three point two
0:22:14no this is the number of it my view is also ramadan to have a
0:22:18general i
0:22:20and if you're here next week for eight
0:22:23or someone will talk about
0:22:25this is more the
0:22:28i am going to move
0:22:30two variants
0:22:33dialogue policy
0:22:34one difference between v and policy optimisation
0:22:42o
0:22:43why dialogues are here
0:22:45and i'm at this point in dialogue
0:22:49tracking accumulate everything that happened so far in the dialogue which is important for coarse
0:22:55age
0:22:56i really tracking summarizes the past
0:23:00but what else policy to
0:23:02well there will always this point yes
0:23:06okay the action in such a count of these dialogue act
0:23:12bill be the best or when the user will be satisfied at the end of
0:23:17this time
0:23:19so the policy has to low future
0:23:24is the one that
0:23:27and what is the machine learning framework which allows us to perform live
0:23:33well that uses reinforcement learning
0:23:37reinforcement learning we have our dialogue system it is interacting with our user
0:23:43the system is taking actions
0:23:46and the user is responding results patients
0:23:50based on these observations we create the state
0:23:55the user is occasionally giving us the board
0:23:59no here and i say user may be real user controls maybe simulated user has
0:24:06really need to be to have i really exist
0:24:11notable is applied
0:24:14that's these states to actions
0:24:19and
0:24:20you want to find a policy it gives walter and user satisfaction
0:24:28so there exists
0:24:30once you
0:24:32remind you of some of the concepts in you know reinforcement learning that here are
0:24:37that we have
0:24:39so that and the most important in the concept of every tear
0:24:45so here at this point in that in the features that are going to and
0:24:49at this point the reader is the random variable which says what is the overall
0:24:55we were from this point that are
0:25:00no because it's a random variable
0:25:03maybe the estimate we can only estimate the next page
0:25:08and they expectation return starting from a particular believes eight is divided function
0:25:17and if we take the expectation start from a particular belief state updating a particular
0:25:22action it's q function
0:25:25estimating by the function q function or policy is equivalent if we find the optimal
0:25:32q function will also be able to find
0:25:36the optimal policy
0:25:39i reinforcement learning by function or q function or policy or approximate it is the
0:25:45network
0:25:47this is good because neural networks give us more here approximation
0:25:53which is preferred drug reinforcement learning was not of these functions are functions
0:25:59the automated it's the optimization over the years that's local optimal
0:26:05no probably the most famous people deep reinforcement learning algorithm using you network
0:26:12well as you network do
0:26:14i
0:26:15approximates q function as a neural network parameterized parameters
0:26:21and here we have a great in open lost
0:26:25which is the difference between what our parameter a parameterized function is a setting and
0:26:33maybe more your pain and what are
0:26:36there is
0:26:37one feature vector
0:26:41no problem to me is it is used as a biased estimates
0:26:48they are how that are correlated and targets are nonstationary
0:26:52which is all the reason why you is a very unstable algorithm it can often
0:26:59happen you can imagine give you good results
0:27:02that sometimes it doesn't work tool
0:27:06i think is all you want to optimize policy using a network
0:27:13i assume parametrization policy with parameters only
0:27:17and then what is greater here that's what the gradient of the object here want
0:27:23to maximize the by the initial state that is given by
0:27:28you only got
0:27:29and this is what
0:27:31what policy gradient is what it's
0:27:34why is it i don't have here to prove here
0:27:38but not just say it is directly used in reinforce algorithm also not complete
0:27:46however it is okay so that it is not is like the one from the
0:27:51un
0:27:52but has a very high variance which again is not something that
0:27:57three four
0:28:00you know
0:28:01it's also use the not clear creek to connect the search is going to give
0:28:06you a diagram of what an actor critic a cow are clearly frame looks like
0:28:12so this is our user this is our policy optimised there
0:28:19model that has to part one after this is actually out are all the steepest
0:28:23eighteen actions
0:28:25and i'm is critical that criticises this actor
0:28:29so make some action user wants be rewarded and belief state and then i think
0:28:37that we define how words
0:28:40our after what's
0:28:45that's a dialogue system does not apply these methods to dialogue systems like to modeling
0:28:54four or the policy of analysis we often find it takes too many iterations to
0:29:01train
0:29:02so we resort to using a summary space
0:29:07so what is me
0:29:09we can estimate of our state
0:29:12and i there is i
0:29:14it only choose you know
0:29:16and full of action
0:29:18and we had some heuristics which they you what this may be okay
0:29:25it uses that this actually belongs to
0:29:30a much larger master action space that hasn't toward are typically toward greater magnitude or
0:29:36actions in the summary space
0:29:40but this is obviously not good with
0:29:44i really want to build an actual conversation you want to buy a any kind
0:29:49of
0:29:50here is explicitly flights and choose between
0:29:55much richer actions
0:29:59so the problem is it's too many interaction need and
0:30:03this solution in this case is you experienced replay
0:30:09and i don't know i'll
0:30:13however this produces a much larger
0:30:17allows us to learn a much larger space
0:30:22so it is algorithm which is called a server it's and i critic algorithm uses
0:30:28it is played
0:30:29e s q function off policy
0:30:32and uses be raised to compute the heart skipped also uses trust region policy h
0:30:39so that it just briefly go through these point
0:30:43now more experienced reply
0:30:48have interaction with your dialogue system you're generating something that is cool
0:30:54now in order to maximize the value of you at a
0:30:58it's not times you can also go through that they and we played experi
0:31:04no it is that a point not the system has learned something and all its
0:31:10on actions are not particularly good so we should be exactly the same reward
0:31:19there for you importance sampling ratios to this
0:31:26it's a piece
0:31:27is that they have it was generated in principle is not what we have right
0:31:33now
0:31:34and how we're
0:31:36our gradient
0:31:40well it is important issues
0:31:44now if you that's for q function
0:31:47do you will inevitably have to four
0:31:50the whole trajectory to model keep it is important sounding ratio
0:31:56multiply small number x
0:32:00they're the irish
0:32:03or if you marked by very much better
0:32:06a explode
0:32:08and this is funny truncate the importance of a
0:32:13and also add bias correction utterance just to acknowledge that you're actually making
0:32:21it is what's
0:32:23retrace algorithm allows us to do
0:32:26remind
0:32:28we want to use actor critic framework so we want to estimate for policy and
0:32:33q function
0:32:34resulted you you'll and providing biased estimates for q function
0:32:39so in that it one for hardly hear
0:32:43for our for our lost for q
0:32:46and given by retraced all agree
0:32:51you and we
0:32:53and when you as work from the one on why this provides one of is
0:32:59that it small area but i just give you case why is you don't have
0:33:04this and school clay
0:33:06the thing is merely multiplying our
0:33:10importance sampling rate issue
0:33:12but we are trying to say that
0:33:14so it is that they don
0:33:17they don't vanish
0:33:19and if you know these errors here
0:33:23you know what we had in our in our
0:33:28you
0:33:29but there is no
0:33:31right here which we shall with this
0:33:35this is employed
0:33:36is not
0:33:38by s
0:33:41and then manually think that we do is a trust region policy optimisation
0:33:47now the problem is that there are all i think probably steve directly in a
0:33:53reinforcement learning framework in the be proportional planning framework and small changes in parameter space
0:33:59can result in very large an unexpected changes the policy
0:34:04this solution is to use natural gradient but it is expensive to compute the natural
0:34:11gradient gives you the direction of the speakers this
0:34:14but
0:34:15it is natural gradient can be approximated as kl divergence between the policies of all
0:34:24subsequent parameter
0:34:26we have here and then the transmission policy optimization expert or approximate that kl divergence
0:34:33with the first order taylor expansion so that is to see between subsequent all densities
0:34:42small so that you don't have i mean i
0:34:44here you know how policy this is particularly important if you want to their be
0:34:50a promising interaction is really going one to a really
0:34:56afford to say i'm expected
0:35:01so no and we want it is to a to a dialogue system that one
0:35:06are directly in master space
0:35:10we have to have adequate architecture
0:35:14all the neural network
0:35:18no i
0:35:20a critical mass that the set so we are making the point and q function
0:35:24at the same time
0:35:26and in order to make the most of it you share a feature extractor apart
0:35:32from our belief state
0:35:35and that we want to learn a master space we have to choose between very
0:35:40maybe
0:35:41so that it will for policy and the q function
0:35:44will have a part just using the summary action or if you think that this
0:35:48is the dialogue
0:35:50and a part
0:35:51choosing which slot should complement this that go after this
0:35:57and then we have a greater for policy which is given by just three policy
0:36:03optimisation and the gradient for the
0:36:06would you function which is given by a cell
0:36:11okay so how does this work and you know datasets we apply this in the
0:36:16cambridge restaurant domain
0:36:19we have a really very large belief state
0:36:22hence foundry actually is very large number master actions and me about it is simply
0:36:29using its operating on my lap
0:36:34and here are the results
0:36:36so that system showing training
0:36:40the y-axis is showing success rate so we'll
0:36:43that would be
0:36:45successfully completed or not
0:36:47and all this model is learned and mastered state at all
0:36:54and the other learning one summary actions
0:36:58do you hear its on the policy is expected policy this learning the summary space
0:37:04ease faster because the parents between much smaller number of actions
0:37:08but actually menu
0:37:11it mister actually
0:37:12space has to wonder why don't you or actions it actually only twice this little
0:37:19so this is good news
0:37:21so as it were actually has these policies interaction with real users amazon mechanical turk
0:37:28we use
0:37:31and see that the performance in terms of success rate
0:37:35are almost the same
0:37:37but actually master actions case policy or position
0:37:43this is the right is to gather why we have a regional in a house
0:37:49to it and it has just been accepted i transactions with speech and language
0:37:55only speech and language
0:37:59okay so one thing
0:38:02which you probably her about a
0:38:08i hear that a my student for basic talk about
0:38:13it would be addressed
0:38:15the problem of having the user models
0:38:20so when we optimize dialogue management you need to actually the simulated user we often
0:38:27find we assume you can use a simulated users that are hand-coded
0:38:33or not very realistic as the ones we have role
0:38:37as
0:38:38interaction if you have you have users
0:38:41this solution here is to train
0:38:44user model in an end-to-end fashion
0:38:47and you have outcome is potentially have more natural
0:38:50a conversation this simulated use
0:38:55signal
0:38:56what you will stimulate users and you simulated user consists of three
0:39:03the first part of the goal generator and you can think about this is a
0:39:07random generator to generate
0:39:09what goals that the dialogue
0:39:11at the other real user can't hack
0:39:15and this and
0:39:16is the feature extractor so in the feature extractor
0:39:20extract features from what is in this state
0:39:24that relates to what users well it's
0:39:28and then you have to see this is because you will never do not features
0:39:33history to the user utterance
0:39:37so here it is how it works so
0:39:42i some features so for instance if the system that i'm sorry there is no
0:39:47such i-th turn and it would be speech is what else out that
0:39:53a whole like now with what the user will and this user goal can then
0:39:58potentially change
0:40:00we have and un
0:40:03a human life story off
0:40:05this feature
0:40:07in that it layer
0:40:10and the sequence the sequence would
0:40:15we each we start with the start of sentences
0:40:19and a with the word that the simulated user in spring you see
0:40:27so we should s simulator on his easy to see that because they sit there
0:40:35real users to without systems so you want to model how
0:40:42how the real users are correctly
0:40:47and i simulated user in a fly unorthodox way so
0:40:55but there is not so interest in into how well we were interested how the
0:41:00user simulator ones sentences but also how can help us in training
0:41:07a dialogue system
0:41:09so which well five for each user simulator
0:41:16we
0:41:17but our
0:41:21in which we train policies
0:41:23so what one
0:41:27policies were trained with neutral user simulator which is completely statistical and another one be
0:41:33the agenda-based user simulator which is having which is based on rules
0:41:40and for which a user stimulate the best performing policy on the other on the
0:41:47other user simulator so that will become probably
0:41:51ally clear in the next line
0:41:55and then be and why these policies i'll to interact with users on the camcorder
0:42:04so
0:42:06a user simulator training
0:42:09for all that is used for policy training so one neural user simulator another one
0:42:15is
0:42:16okay
0:42:18and that we know how well a policy performing on neural user simulator
0:42:25and what the best performing policy that's train a user simulator
0:42:31and performing well on agenda-based
0:42:36similarly for the agenda-based
0:42:40no i mean what is the results show that your policy on agenda-based user separately
0:42:46and then performing really agenda-based user simulator it's not going to one particular well on
0:42:54real users
0:42:56so rule based
0:42:59approach is to build a user simulator is not particularly
0:43:04but you always knew real user simulator and if you are wanting rate real users
0:43:10a vector
0:43:13we use
0:43:15but the best performing you want to actually train neural user simulator and best performing
0:43:22the agenda-based user c
0:43:25each is that it's the learning is promising for modeling use
0:43:32but see i with the h in it or you hear if we want the
0:43:40best
0:43:41performance
0:43:44okay so well i lost five minutes i would like to talk about something that
0:43:50probably closer to this all
0:43:53in our community
0:43:56how do we effectively
0:43:59evaluate dialogue models
0:44:01and how do we compare however
0:44:04how can we print use good style
0:44:08similarly was pcs here only a handful of loops around the world had access to
0:44:15or at a time axis
0:44:17and this is something that you want to change in cambridge
0:44:23because you we want kicks not really and also allow people to easily compared to
0:44:29each other
0:44:30so we that mine are toolkit for building the house a statistical dialogue systems open
0:44:37source
0:44:38well i
0:44:40if we use a i'm not sure a simulated environments
0:44:46i algorithms
0:44:47it can compare so you want to test the new
0:44:52a new policy
0:44:54you can more easily compared to the two sets the state-of-the-art
0:44:59the collected a large corpus that i've just described
0:45:05in school monte was we are making this open access
0:45:09and this work was funded a by my faculty of four
0:45:15so just a few works the bar titled i'll
0:45:19so i know where is a implementations of statistical approaches to dialogue system
0:45:26and it's more similar so you can vary you see a exchange your module for
0:45:32the currently available functional in the two q
0:45:36it can very easily be extended each other much closer domain and as if you
0:45:43a four hundred dollars you would use it to your dialogue system
0:45:49it offers not domain conversational function
0:45:55and you the coherent also subscribe to our at this
0:46:02and this was reading that words from or not just the card numbers but also
0:46:06from the previous members a off the have systems group
0:46:12and he's constantly expand
0:46:17so in terms of benchmarking
0:46:19you want to have a way of comparing algorithms in a fair way
0:46:25so for freebase we define the main is different user settings and also different noise
0:46:32levels in the user input so at
0:46:36by the total
0:46:38and
0:46:39state and number of state-of-the-art parser optimization algorithms including the acre a brief digest of
0:46:46about
0:46:48so this initially it was let me because in the ema and probably chernotsky and
0:46:54with present it needs symposium on the last year
0:47:00so it's basically you the end of my talk
0:47:07it's i mean you it's machine learning
0:47:10allows us to solve any problems in on the rear facing in building natural conversations
0:47:19you are you married it allows us share concepts between really tracking and in so
0:47:26that we can have
0:47:29so that we can
0:47:30the operational system
0:47:33in the same they come up with us to know to build a more always
0:47:38the optimization modules use between a wide variety of a only of action
0:47:45and also allows us to build more realistic models of users
0:47:50so that we can train more accurate
0:47:53policies
0:47:54but there is a lot to be time to actually achieve a goal of an
0:48:01actual conversation
0:48:02and this is just the input the high score
0:48:05so some of the i'm years are how we want to talk about
0:48:11i'm structure they can we need to their a knowledge base
0:48:16if you want system for very long conversation we need more accurate
0:48:24and more sophisticated reinforcement learning more models
0:48:29and finally we need to achieve sentiment various have more nuanced
0:48:34we weren't fun function
0:48:37to take into account when we are building
0:48:40i don't have exist
0:48:43so that's can will bring us closer to the a long term vision which is
0:48:47have a natural conversation goal directed that
0:48:51if you very
0:49:29so would be compared with the so there is a statistical version of the agenda-based
0:49:34imitate their
0:49:36but you
0:49:38realise on hands on having the structure
0:49:42or all the conversation in this free that you first
0:49:48asks some so some parts of it are hand-coded and then it has
0:49:53pockets which are trained so this is done on it
0:49:57the overall problem solving the overall problem of natural conversation would not be applicable because
0:50:04we still have
0:50:05that structure which is fixed so we have compared to that but actually this neural
0:50:11stimulator was trained on very small amount of data so
0:50:15i don't know if i have exact numbers dstc two is only i think one
0:50:19thousand dialogues
0:50:21so that's it's not a lot
0:50:24no because of that didn't do not parameters were kept really small so for instance
0:50:30if i go back
0:50:34so we don't actually have in which the user the system
0:50:39here we have in what's the semantic form of the of the user senses
0:50:44so then this feature extractor is a fact i is very easy to build
0:50:51i otherwise you would need a cnn or something more sophisticated here so that it
0:50:58is and it would expand the number of
0:51:01also how many
0:51:04how we uk these vectors to be useful then implies how many parameters you have
0:51:10analysis
0:51:11so in this model everything this cat very small
0:51:15just to account for the fact that you have a very small amount
0:51:27so
0:51:40so a lot
0:51:43we carry you mean if you want to start from scratch or if you want
0:51:47to use some of the models
0:51:51do you want to start from scratch
0:51:53then basically everything
0:51:56everything is domainindependent in that sense so
0:52:00in particular belief tracking
0:52:04there
0:52:11so maybe tracking takes input
0:52:15and the ontology
0:52:16so this is very this is just and the additional inputs to the belief tracker
0:52:21and you in back
0:52:23the word vectors with you have been your ontology to begin with
0:52:29so traditionally you want okay and whether it appears in the user send
0:52:35here we take where the vector of that
0:52:39and compare the similarity of that word vector with our feature extraction
0:52:44and we have
0:52:45three or generic feature extractor which is the main slot and value
0:52:49so
0:52:50so there is crazy this should were as it is to a different domain
0:53:09right
0:53:10so in an accidental is that there is a more difficult problem in that sense
0:53:16so you would need to redefine and then system i six
0:53:21in forty two forty two work
0:53:25i
0:53:27and then
0:53:29it will
0:53:31however knowledge base looks like so whether
0:53:35maybe you embed the where it's already been back maybe a particular constraint
0:53:40so
0:53:41i read in that the two
0:54:27so that works very unhappy one stage of not requiring
0:54:32label they tell the intermediate step
0:54:35and that is a huge amount which because if you're generating millions of coals every
0:54:41week
0:54:42you don't have asked to twenty eight
0:54:45so it certainly work investigated process inside because of batteries
0:54:50but the downside that actually work
0:54:54is about
0:54:55and the reason for it is that it is still not able to figure out
0:55:00how to train these networks do not require additional separation
0:55:06so a lot of their own that is
0:55:09along that line goes stable about
0:55:11having and tool and differentiable neural network that you can propagate gradients true but you
0:55:18still need at some kids the supervision to allow you to actually have
0:55:24a meaningful output
0:55:29and i'm not a problem is the evaluation obsessed is so
0:55:33the research in this area have six hundred and many people or not originally from
0:55:41dialogue are doing research in this area and they take this is a translation possibly
0:55:46have
0:55:47system input in user out and this is really not the case yukon
0:55:51and why respect to be the bleu score that doesn't say anything about
0:55:56the quality of these dialogue and it doesn't take into account the fact that you
0:56:00can have a long-term conversation
0:56:08three
0:56:10yes
0:56:26right
0:56:40you raise or so that
0:56:44say
0:56:47so that there would be i mean
0:56:49so the one in speech recognition have looked at this problem of having to iterate
0:56:54over huge number of all works for instance in our in language modeling and there
0:57:00are some cases like using things can on contrastive estimation do not have to do
0:57:06a softmax but rather have normalized
0:57:09output so that
0:57:10it's one thing with for this work we need to have some similarity metric between
0:57:16and some confusability between different
0:57:20different elements of the ontology
0:57:23so i don't know whether i have a we can answer to how to actually
0:57:26do that
0:57:28value
0:57:29we have as a whole the ontology and then
0:57:35sponded
0:57:36for a non backed
0:57:40having
0:57:43because all you actually want to have is a good
0:57:47is this would space representation
0:57:51so you can almost
0:57:53you can almost i'm from it and then noted here that would be a particular
0:57:57work but that's very difficult to okay
0:58:01so i think some interesting problem
0:58:20it sometimes a really difficult then we actually have addressed this in this work
0:58:25so
0:58:28that
0:58:29that doesn't
0:58:30produced a good and bad
0:59:36and in here is used
0:59:40you use it in the sense that you know it is that okay consists or
0:59:45something to do this slot summary action is something to do with slots
0:59:50that's you know i
0:59:51how many slots you will talk about
0:59:55you like the system learn a to do that
0:59:59so you know especially if you don't have enough training data you can always equal
1:00:04rate at the system but once we were interested here was mostly to see whether
1:00:11we can be you because
1:00:13if you look at the reinforcement learning tricks which are really a use a reinforcement
1:00:21learning for problems which can be da five simulated and it's often discrete space it's
1:00:29it's the setting of joystick what to what the action space a to choose between
1:00:33a very small number of actions
1:00:36and if you want to apply the time period sticks without seriously you will inevitably
1:00:41have to learn a larger
1:00:44state action spaces this is really what we were interested in here but obviously you
1:00:50always equal rate
1:00:53which you just
1:00:55described