0:00:17 | so the mixed speaker |
---|
0:00:29 | so the next be could be sounded |
---|
0:00:31 | so these |
---|
0:00:36 | we study presentation |
---|
0:00:40 | okay everyone my name is sent it and i'm going to present our data dialog |
---|
0:00:44 | state tracking and you don't reading completion approach |
---|
0:00:46 | this is a dying forbidden below we shake tag and the like another set from |
---|
0:00:49 | the amazon alex a it means anyone california |
---|
0:00:52 | so i'll first briefly introduce of the problem galaxy tracking is i guess sort of |
---|
0:00:55 | you already know that work |
---|
0:00:57 | thus for completeness and then i'll |
---|
0:00:59 | talk about the motivation of our approach going to the tts of architecture show some |
---|
0:01:03 | results innovations thirties and finally conclude that some at an analysis |
---|
0:01:07 | so let's start so this is a the state discuss order dialog state is basically |
---|
0:01:12 | dialog state represents a composition of dialogue history but galaxies basically |
---|
0:01:17 | to represents what the user is interested in at any point in the conversation and |
---|
0:01:20 | typically you the presenter dialog state with |
---|
0:01:23 | slots and values |
---|
0:01:24 | so here in the in the first and the user say that it needs a |
---|
0:01:27 | book he needs to book a hotel in the use of that four stars and |
---|
0:01:30 | this corresponds to a state where you have to start stars any together the respective |
---|
0:01:34 | values |
---|
0:01:35 | the elderly represents a domain that the user is talking about |
---|
0:01:37 | and it will become more evident by that's important because |
---|
0:01:41 | in the conversation again have multiple domains |
---|
0:01:43 | so in like to examine the second done the user sees that so that it |
---|
0:01:48 | can response asking if they surprising the user say that does not matter if it |
---|
0:01:52 | has three wifi in parking in so how the spigot submitted this with three new |
---|
0:01:57 | slot spotting and internet with the values us and the price don't get and the |
---|
0:02:01 | other does not starting here gets carried about |
---|
0:02:04 | in the next on the agent give some recommendation user say that sounds good i |
---|
0:02:08 | would like also like a taxi to the ordered from cambridge sonar here we see |
---|
0:02:11 | that these stars correspond the hotel domain gets got it over |
---|
0:02:14 | but they are |
---|
0:02:15 | slots to new starts departure and destination |
---|
0:02:17 | corresponding to |
---|
0:02:18 | a new domain taxi that also we need to |
---|
0:02:21 | which also gets a bit in the dialog state so |
---|
0:02:24 | know what is the task of dialog state tracking so you are not attacking basically |
---|
0:02:28 | means you want to predict |
---|
0:02:29 | the dialog state of the user one or more complete you are given the dialogue |
---|
0:02:33 | history plus the current user utterance and you want to predict a distribution over the |
---|
0:02:37 | our dialogue states and we saw the galaxy stability to typically to presented as slots |
---|
0:02:42 | and values so this means a state trackers are |
---|
0:02:44 | output a distribution over the slots and all the associated values |
---|
0:02:47 | and that i looks too quickly consists of features like past user utterances pa system |
---|
0:02:52 | response |
---|
0:02:52 | it can have previous belief state or even any you interpretation of that is available |
---|
0:02:56 | so this is the task |
---|
0:02:58 | i don't to talk about briefly about what are the other traditional approaches to say |
---|
0:03:01 | tracking |
---|
0:03:02 | so one of the common approaches is a is very you encode the dialogue history |
---|
0:03:07 | to some model architecture and then you have |
---|
0:03:10 | you have a linear plus softmax layer on top and you are put a distribution |
---|
0:03:13 | over the vocabulary |
---|
0:03:14 | all the slot type and you do this for each slot in your scheme although |
---|
0:03:17 | our dialog state |
---|
0:03:19 | for example here you see on a protocol joystick tracking where the encode the dialogue |
---|
0:03:23 | history using high technical lstm and then on top of that on the hidden representation |
---|
0:03:28 | of the context they have a few properly or one for each not type |
---|
0:03:32 | and then softmax layer to output the distribution of would be values that the that |
---|
0:03:36 | particular star can take and these are the values which you have seen on the |
---|
0:03:39 | training set |
---|
0:03:40 | this brings to like to |
---|
0:03:42 | main problems which such approaches |
---|
0:03:44 | one is that they cannot handle out-of-vocabulary slot value mentions because the only output the |
---|
0:03:49 | distribution over values that have been seen in the training set |
---|
0:03:52 | the so in such a process it is assumed that the vocabulary or the ontology |
---|
0:03:56 | is known in advance |
---|
0:03:57 | and the second thing is that they do not scale well for slots that have |
---|
0:04:00 | large vocabulary |
---|
0:04:00 | but example the slot based on in we can assume that you can imagine that |
---|
0:04:04 | the slot can take values from a possibly very large set so there's not enough |
---|
0:04:08 | data to learn a good distribution over this large vocabulary |
---|
0:04:11 | so on the other hand the teaching completion approaches typically do not rely on the |
---|
0:04:15 | fixed vocabulary |
---|
0:04:16 | this is because there are typically reading completion approaches are structured as |
---|
0:04:20 | i an extractive question answering their the goal is to find a span of tokens |
---|
0:04:23 | in the |
---|
0:04:24 | in the passage which can t is the answer so there is no fixed vocabulary |
---|
0:04:28 | and the second thing is |
---|
0:04:29 | also that they have been a lot of be set advancement in reading comprehension that |
---|
0:04:32 | we can leverage |
---|
0:04:33 | if we structure our problem of state tracking as reading comprehension this let us to |
---|
0:04:37 | propose this be computed for dialog state tracking and |
---|
0:04:43 | in the next side of |
---|
0:04:44 | before i go to exactly how we found in the problem i also want to |
---|
0:04:47 | just give a month later or would be of how |
---|
0:04:50 | typically machine reading compuation problems are opposed |
---|
0:04:52 | so the general idea in reading companies you are given a question and pass it |
---|
0:04:55 | and you are looking for a start of tokens in the passage that can be |
---|
0:04:58 | assigned to |
---|
0:04:59 | it's also to extract a question answering |
---|
0:05:01 | and how people do is you encode the past it european a representation of each |
---|
0:05:06 | token in the past would you encode the question you have a question representation and |
---|
0:05:09 | on the top you have generally have what ancient head i training from the question |
---|
0:05:13 | to each token in the past it one of the intention had to present the |
---|
0:05:16 | start probability distribution |
---|
0:05:17 | and the other representing and probability distribution once you have these two probability distributions you |
---|
0:05:21 | just output b |
---|
0:05:22 | at this point all the most probable span |
---|
0:05:25 | and that is your answer |
---|
0:05:26 | here it shows a popular architecture contatenate which is from microsoft the internally gets on |
---|
0:05:31 | one and the use bunch of self attention according to layers to encode the basses |
---|
0:05:35 | tokens |
---|
0:05:36 | with the general it is assumed that you encode passage and question and then you |
---|
0:05:38 | have attention for representing the start and end spent |
---|
0:05:41 | so not less look at how we form the guitar that it's a tracking problem |
---|
0:05:45 | as a teaching completion so |
---|
0:05:46 | is the same dialogue as before |
---|
0:05:48 | user is looking for a hotel |
---|
0:05:49 | and after the second on you want to predict the values for each of these |
---|
0:05:52 | slots at a hotel at a reading rise |
---|
0:05:55 | and so on |
---|
0:05:56 | and this easy chart takes into something like this may |
---|
0:05:59 | you're dialogue context the whole dialogue context becomes a passage between alex and user times |
---|
0:06:04 | and then the questions or something like what is the requested hotel at all but |
---|
0:06:08 | to the requested value of the slot that you want to track and or is |
---|
0:06:11 | something like this parking required in total and so on and then what you want |
---|
0:06:14 | to find is the answer to these questions so |
---|
0:06:17 | hotel for these first question you can look for the arts and the passes and |
---|
0:06:20 | the models are point or something like ease and for some luckily second mission |
---|
0:06:24 | and you are looking for hotel creating the models the point of this setup tokens |
---|
0:06:27 | for starts |
---|
0:06:28 | so as simple as that |
---|
0:06:31 | no representations of how we present if a different got different components so dialogue history |
---|
0:06:36 | which is also like the passage an arc formulation is represented as a concatenated user |
---|
0:06:41 | in it and onset is to solve |
---|
0:06:42 | it can be either one dimensional representation order to like to have assumed matrix like |
---|
0:06:47 | a hierarchical representation and then you can use probably had a cloud in is to |
---|
0:06:50 | encode them |
---|
0:06:52 | and the slot which is the question in our formulation is domain class light emitting |
---|
0:06:57 | we want to mean as well because as we saw in the previous ones the |
---|
0:06:59 | example in there |
---|
0:07:01 | it's not get out a data taken them a span multiple domains |
---|
0:07:04 | and we have a fixed dimensional vector for these domains not combination which is learned |
---|
0:07:08 | along with the full model |
---|
0:07:10 | one thing to note here is that unlike what actually alike |
---|
0:07:13 | we don't actually convert the slot into a full natural language question we just three |
---|
0:07:17 | the embedding of the slot plus domain |
---|
0:07:19 | as the question itself |
---|
0:07:21 | and finally the onset is adjusted |
---|
0:07:23 | starting in position in the conversation |
---|
0:07:26 | "'kay" so this is the main model in our approach is quite the slots and |
---|
0:07:30 | model |
---|
0:07:31 | which is just like a typical extract if you're model what it does it predicts |
---|
0:07:34 | the slot values this panel to consider in the dialogue the you have starting point |
---|
0:07:38 | does and the starting spend a lot to bilinear tension between the dialogue context and |
---|
0:07:42 | the slot invading |
---|
0:07:43 | just like reading completion models and example shown here is |
---|
0:07:46 | the same dialogue proposed on the uses a user wants to book a hotel in |
---|
0:07:49 | these four stars so after the first and if you want to track this not |
---|
0:07:53 | wouldn't hotel at a so in this case will assume that our model outputs a |
---|
0:07:58 | start and probability which is high for the eight token in the context which represents |
---|
0:08:02 | basically they down south east |
---|
0:08:04 | okay so but this model is not sufficient |
---|
0:08:08 | and this is true also for a question i think cases because |
---|
0:08:11 | in certain slots that can take values from a closer like this a parking and |
---|
0:08:14 | internet yes no so we need to can't for that and also the assumption slot |
---|
0:08:18 | that can have a value core don't care for example pricing in the previous example |
---|
0:08:22 | and |
---|
0:08:23 | many of the slots they are never mation to the schema and so you need |
---|
0:08:26 | to fill them with the default none value so these are the kids at that |
---|
0:08:28 | cannot be guardedly handled by the span model |
---|
0:08:31 | so to do this we augment are q model be to other auxiliary models at |
---|
0:08:35 | cal you would model and the slot take model |
---|
0:08:37 | okay you will model predicts whether we should just |
---|
0:08:40 | a bit a slot value of in the current dialog the scheduler the old slightly |
---|
0:08:44 | from the previous done and in the beginning it's not is initialized at the t |
---|
0:08:47 | for none value |
---|
0:08:48 | and a type model is just a simple classified which makes |
---|
0:08:51 | decision about one of the four classes related yes no don't care order span type |
---|
0:08:57 | so i'm going to because of the two models okay you are modeled as i |
---|
0:09:00 | said it just predict so that will be the slot value for the content on |
---|
0:09:02 | or to tell you what and it makes the binary decision for all the stories |
---|
0:09:06 | jointly at each done |
---|
0:09:08 | an example here would be so after the first and you have |
---|
0:09:11 | values so what one thing i wanted like if i get the can you are |
---|
0:09:14 | model is a kind of confusing because |
---|
0:09:18 | what it exactly is it a slot a bit model what by mean that is |
---|
0:09:21 | like |
---|
0:09:21 | the one he represents that |
---|
0:09:23 | you want to update the slot and zero to present that you want a caddy |
---|
0:09:25 | or i just give this convention because we have it can in the people |
---|
0:09:28 | so in here after the when you go from the forced down to the second |
---|
0:09:32 | done the using as mentioned three new starts by five |
---|
0:09:36 | like internet parking and the pricing so those slots will get a bit rates of |
---|
0:09:39 | the values one by the added to start at an stars |
---|
0:09:43 | they will be single because they want they will just get carried away from the |
---|
0:09:46 | previous turn |
---|
0:09:48 | and the type model is a simple it just predicts the start i given the |
---|
0:09:53 | question which is the start and the dialogue context and it makes a for a |
---|
0:09:57 | decision but yes no don't get a span simple example would be just a hotel |
---|
0:10:01 | at a full in this context would be a span type because you want to |
---|
0:10:04 | find the value used in the context and for the slot would barking the value |
---|
0:10:08 | would be just |
---|
0:10:08 | yes so it would be the aesthetically that the model should output |
---|
0:10:12 | okay the so putting all this together the combined model is also be at the |
---|
0:10:16 | bottom most we have about embedding it will cover the tokens in the passage |
---|
0:10:20 | next we have a connection limiting i coding which is basically a bidirectional lstm |
---|
0:10:25 | we just use only a bidirectional lstm one so this will give us the contextual |
---|
0:10:28 | representation for each of the tokens use the last hidden layer of the lstm which |
---|
0:10:31 | gives us the embedding of the dialogue |
---|
0:10:33 | we embed the question using just the start as domain of adding a randomly initialized |
---|
0:10:38 | and we just learned to the model |
---|
0:10:40 | then so this |
---|
0:10:42 | dialogue embedding back to will data t v used to predict these not get you |
---|
0:10:47 | were decision so we have an instance in my |
---|
0:10:49 | layer on top of that it just makes the binary decision for each of the |
---|
0:10:51 | slots |
---|
0:10:53 | for the slot i one of the input the dialogue embedding vector along with the |
---|
0:10:56 | question vector and then it makes that's a softmax the to predict |
---|
0:10:59 | i one of the four classes |
---|
0:11:00 | and the spend more and finally will take input the question vector and that can |
---|
0:11:04 | have attention from the question to each of the tokens in the past it's just |
---|
0:11:08 | like any dm model and you would have these start span prediction and the in |
---|
0:11:11 | prediction |
---|
0:11:12 | so at infinite what happens is you will you will begin you with a single |
---|
0:11:17 | dislike at what model if the cat you were modeled sees |
---|
0:11:20 | a one which means to update the slot if it is a zero then we |
---|
0:11:23 | just carry over the slot value from the previous done if it saves one which |
---|
0:11:26 | means you want to be to start |
---|
0:11:27 | then we label that i model |
---|
0:11:30 | that i models easiest nor don't give it a bit the slot value for that |
---|
0:11:32 | if it's a span then meanwhile disband model to get at |
---|
0:11:36 | the start and end position of the slot value and then we just extract that |
---|
0:11:40 | from the conversation and update a slot value |
---|
0:11:42 | okay so |
---|
0:11:44 | everyone and the two they have been using the same data set i can do |
---|
0:11:46 | you know with the multi was dataset it's |
---|
0:11:48 | most which is a human document collection about two point five thousand single domain and |
---|
0:11:52 | seventy multi domain dialogues |
---|
0:11:54 | it has annotations for dialog state and system acts we don't user dialogue act in |
---|
0:11:59 | this in the small |
---|
0:12:00 | and some statistics on that has about it of the four dialogs about hundred fifteen |
---|
0:12:04 | thousand dollars |
---|
0:12:06 | and averaged about answer starting point five in total exhausted we're tracking here is thirty |
---|
0:12:10 | seven |
---|
0:12:10 | a cross six domains |
---|
0:12:14 | some results |
---|
0:12:15 | so this is the original so before that the metric it is joint goal accuracy |
---|
0:12:20 | which basically means that activity done you want to predict all the slots critically if |
---|
0:12:24 | any of the start is a round then the value the accuracy zero otherwise one |
---|
0:12:29 | so it so it is strict metric |
---|
0:12:31 | so the audio this other |
---|
0:12:33 | the first number it's from the original multi was paper the response people |
---|
0:12:36 | glad and dcr what about that have been there a lattice using like sender can |
---|
0:12:41 | do out and then split |
---|
0:12:43 | i mean the global tracking in a local track attendee c is just a simplified |
---|
0:12:46 | version of black |
---|
0:12:47 | so these two numbers and then |
---|
0:12:49 | dstc joint state tracking that i should before where the encode decode and dialogue history |
---|
0:12:53 | too high typical lstm |
---|
0:12:54 | and then have a feed-forward layer for each start i |
---|
0:12:57 | so that the number is about thirty eight to not approach with the single model |
---|
0:13:01 | is bits all these approaches |
---|
0:13:03 | and then we'll to be done on someone model which is basically just take a |
---|
0:13:06 | majority would between t different a models trained with three different seats |
---|
0:13:10 | and finally we also wanted to come we also wanted to check |
---|
0:13:13 | a however this work if you just combine our approach with this |
---|
0:13:17 | with a close look at videoplus like of demonstrating a joint state tracking model and |
---|
0:13:21 | how we combine is it is very simple we just |
---|
0:13:23 | choose one of the two approaches based on |
---|
0:13:26 | for each slot we choose one of the two approaches based on which of it |
---|
0:13:28 | is better |
---|
0:13:29 | for that particular slot on the dev set |
---|
0:13:31 | and this gives us a constable whose like about five percent |
---|
0:13:34 | and we see why this happens it |
---|
0:13:38 | we did some recent studies the first and most important is like |
---|
0:13:41 | if we feed the ground truth for all the three models that get so these |
---|
0:13:45 | submissions series of for a for this the single model of are plotted this is |
---|
0:13:49 | not for the |
---|
0:13:49 | we combined |
---|
0:13:51 | a model that the dst |
---|
0:13:52 | so here if we feed on the t carry over to slot-types and these not |
---|
0:13:56 | and model that the ground truth |
---|
0:13:57 | you get the accuracy joint goal accuracy on the dataset as seventy three |
---|
0:14:01 | which basically means that approach is upper bounded base of entity |
---|
0:14:04 | what that basic you need to decrease with seven percent of slot values are not |
---|
0:14:07 | even present in the conversation and example would be something like what kind of sports |
---|
0:14:11 | in the context six marginal sports attraction are is available in the centre of town |
---|
0:14:15 | and you want to find the slot attraction or type |
---|
0:14:18 | the if the answer is multiple support a model will never get it right even |
---|
0:14:22 | if the model and it points to support it does points to this values board |
---|
0:14:25 | it is not the same as the ground truth is much but also in this |
---|
0:14:28 | area bounded |
---|
0:14:30 | by seventy percent and this is also the reason and combine our approach with the |
---|
0:14:33 | all close look at very which is more based on the ontology we get some |
---|
0:14:36 | post |
---|
0:14:37 | and elevation is that board so if we add about you get about two percent |
---|
0:14:41 | gain then we did some oracle with each of the model type so if we |
---|
0:14:45 | place the so the justice not like model with the ground truth so this already |
---|
0:14:49 | constructed model we don't get much again we get about like one percent gain or |
---|
0:14:54 | half of a person in |
---|
0:14:55 | if we replace the slots and model with the grounded we get about four percent |
---|
0:14:59 | in |
---|
0:15:00 | but if we replace the order of the slot carryover model with the ground would |
---|
0:15:03 | be get about sixty we get about twenty percent |
---|
0:15:05 | the in so as you can see that this is the bottleneck here the caddy |
---|
0:15:08 | were model this is also evident from if you look at the accuracy for each |
---|
0:15:11 | model that i understand models have like |
---|
0:15:13 | ninety to ninety five percent which is pretty high |
---|
0:15:15 | but i and you're model only has like seventy percent of seventy six percent unable |
---|
0:15:18 | accuracy |
---|
0:15:19 | so this gives direction for future work may be wanting prove this |
---|
0:15:23 | you model |
---|
0:15:26 | so we also analyze how does the performance leafy as being the conversation history but |
---|
0:15:30 | and these are strictly decrease in performance that as a conversation is cheaper and this |
---|
0:15:34 | is |
---|
0:15:35 | because of the other propagation from the caddy one model |
---|
0:15:40 | and finally we did some added analysis we basically took some two hundred data samples |
---|
0:15:45 | and b |
---|
0:15:47 | we did some two hundred and samples the and we analyze the men be bracketed |
---|
0:15:50 | them into for different categories |
---|
0:15:52 | the first in the biggest categories call unanswerable slot data |
---|
0:15:55 | so these are the others which are made by our cat your start getting word |
---|
0:15:58 | model |
---|
0:15:59 | so there to get a case in this the first one is but the difference |
---|
0:16:03 | is non and hypothesis is not and it basically needs |
---|
0:16:05 | the references that we should can't you does not value from the previous done by |
---|
0:16:09 | the i model the same to updated |
---|
0:16:12 | so in this case and in this is the second one is the opposite of |
---|
0:16:14 | this so in the first case |
---|
0:16:16 | even though this is the bulk of i don't like forty two person i mean |
---|
0:16:19 | we look at the actually the others these are not real as the model is |
---|
0:16:22 | making the prediction which is actually correct |
---|
0:16:25 | but there is a lot of annotation noise in the dataset because the state some |
---|
0:16:28 | on either the states is are they are modeled they have adhered model like they |
---|
0:16:33 | are updated after one after one done so because of its all these that you |
---|
0:16:37 | get added as that i was but a bunch of them are about "'em" a |
---|
0:16:40 | lot of them are not really errors |
---|
0:16:42 | in the second case of it is |
---|
0:16:44 | maybe ground for this predicting that we should |
---|
0:16:47 | but with the ground it is to update the start value while our model predicts |
---|
0:16:49 | to just carry over from the previous done in this case the there are some |
---|
0:16:53 | errors |
---|
0:16:54 | for example here you can see the user is trying to book |
---|
0:16:58 | trying to destroy in the centre part of the down and finally the they didn't |
---|
0:17:01 | is able to make the reservation and the new users to next say that you |
---|
0:17:05 | also needs an attraction type near the nystrom so here many via when you want |
---|
0:17:09 | to fill a slot say attraction dark at a so the model is model c |
---|
0:17:14 | is that this would be non which basically means it is not been mentioned |
---|
0:17:17 | no but as you can see that the user says it should be near the |
---|
0:17:20 | neck structure it should be carried away from the previous domain so our model is |
---|
0:17:23 | unable to unable to do that |
---|
0:17:26 | so the next i will denote is what we call in can extract reference which |
---|
0:17:29 | basically means there are multiple possible candidates in the context but our model predicts that |
---|
0:17:33 | on candidate so in this case you see the user is trying to book a |
---|
0:17:37 | hotel with of with all four people made in response to the booking was unsuccessful |
---|
0:17:41 | and the user |
---|
0:17:42 | a basic question at eight people |
---|
0:17:45 | the ground truth is eight of course but our model predicts for |
---|
0:17:47 | so be seen as a lot of this happens and there is at i think |
---|
0:17:50 | as in this case or in the user change its mind so our model is |
---|
0:17:53 | not |
---|
0:17:54 | a robust to these kinds of things and the possible reason would be that models |
---|
0:17:57 | were fitted to a particular entity like for which is the testing more data |
---|
0:18:00 | training set |
---|
0:18:02 | this accounts for about twenty percent of it is |
---|
0:18:05 | the next categories the what we call slot resolution that are here you see the |
---|
0:18:10 | context or something like i want to leave the hotel by two thirty |
---|
0:18:13 | the model with a model pointed to thirty but the ground truth is actually fifteen |
---|
0:18:17 | thirty so these this is kind of like an intended output because we only do |
---|
0:18:20 | pointing the context |
---|
0:18:21 | so these are more like and unlike playstation it is it's about thirty percent |
---|
0:18:25 | the final thing is the slot boundary errors there that's and model makes a mistake |
---|
0:18:28 | it's either exploit it i to get the span which is are supposed to be |
---|
0:18:34 | a different sort it is a subset of the difference in this case the difference |
---|
0:18:37 | is just the nine does as the to start by a model guest not all |
---|
0:18:40 | city center but this is only a small was it is like to point represents |
---|
0:18:43 | the other |
---|
0:18:45 | finally i also want to just |
---|
0:18:47 | but one slide on that the number that i should is about state-of-the-art can be |
---|
0:18:50 | some but it but since then there's a paper it is here this is the |
---|
0:18:53 | straight |
---|
0:18:53 | or transportable multi-domain generated out for task oriented dialogue systems are |
---|
0:18:57 | here what did we deduce pointed entered a network to combine the fixable cavity along |
---|
0:19:01 | with their distribution over the dialogues |
---|
0:19:03 | dialogue history and they get a slightly better accuracy than the |
---|
0:19:05 | the then a model combined with the dst but the a key difference between data |
---|
0:19:10 | points in a see that |
---|
0:19:11 | the user decoded degenerate barstow can try to convey the we just use r two |
---|
0:19:15 | pointers to point to the start and end up the span |
---|
0:19:17 | and |
---|
0:19:20 | that's probably already wanted to thank you and |
---|
0:19:22 | i could questions |
---|
0:19:29 | okay so we have time questions |
---|
0:19:32 | i said to thank you thank you for the talk my question is when you're |
---|
0:19:36 | considering the different types like yes no don't care ends and the span and span |
---|
0:19:42 | this potential eels of another case right with that is when the user doesn't really |
---|
0:19:49 | see |
---|
0:19:50 | the value of the slot is but can infer that like twins fsa and what |
---|
0:19:54 | cuisine type do you want an essay i want some pizza tonight the classifier could |
---|
0:19:59 | be inferred that is the value for the cuisine type will be italian but the |
---|
0:20:04 | user never said italian so the span would not cover that case right |
---|
0:20:08 | so you what the user say so you're geniuses user says i want some pizza |
---|
0:20:12 | night or something that are not okay that's true so those are not covered here |
---|
0:20:16 | because we are just doing more like pointing i and model probably would put expand |
---|
0:20:20 | because it's not one of the two types but we will and will point probably |
---|
0:20:24 | point two is the category but we fail just like in other cases |
---|
0:20:28 | so we have you have a future direction where we can sort inspired from being |
---|
0:20:31 | completion where you can do more like abstract of question answering you can use these |
---|
0:20:35 | as a rational and then try to have a generative model it generates the value |
---|
0:20:39 | which it is most like italian grounded selection that we can we can do that |
---|
0:20:43 | in future |
---|
0:20:45 | okay |
---|
0:20:51 | thanks for the great talk a just one simple question present so if i give |
---|
0:20:56 | you a sentence like i want to go from cameras to and then you know |
---|
0:21:00 | the destination efficient and approaches camera dissing your model can do like in this case |
---|
0:21:06 | you can do better because they are all they are both value for the place |
---|
0:21:09 | right |
---|
0:21:10 | using a model can do better than baseline system within these kind of |
---|
0:21:14 | designs |
---|
0:21:15 | because you are still like slot by slot by then how this the model no |
---|
0:21:19 | destination is |
---|
0:21:20 | it's london is not comments |
---|
0:21:24 | i see |
---|
0:21:26 | so it would because we three the context right so it can learn like from |
---|
0:21:29 | and to from the context that |
---|
0:21:32 | what about you know what about because you check that span try and is possible |
---|
0:21:37 | that both slot |
---|
0:21:39 | both on both a prediction that's n |
---|
0:21:42 | like they all mark no so but we also proceed in the slot type right |
---|
0:21:47 | so destination and so the c |
---|
0:21:56 | no i don't maybe it's in the final present so he have in the predicting |
---|
0:22:01 | the |
---|
0:22:02 | we also |
---|
0:22:03 | have a question vector right so it would be either destination on the source right |
---|
0:22:07 | so it based on that the span model can infer whether it would be |
---|
0:22:11 | the question is user query embedding so follows two slots is the same user query |
---|
0:22:17 | no should be different right so it would be a destination are the source |
---|
0:22:22 | so the other considered slide information yes okay so this is the question recognition it |
---|
0:22:26 | is a slot |
---|
0:22:28 | okay they can you might tell different it meant that if okay cool thanks |
---|
0:22:35 | the questions |
---|
0:22:50 | maybe a provocative question but we have heard many papers about you know |
---|
0:22:56 | dialog state tracking and in particular at this particular corpus and so my question is |
---|
0:23:02 | what do you think we need to take it to then next level |
---|
0:23:05 | when you know we don't talk about going from cambridge to land on or looking |
---|
0:23:11 | for a chinese restaurant on |
---|
0:23:14 | tonight |
---|
0:23:15 | so if you don't or a particularly improving on this dataset i think nine jen |
---|
0:23:20 | be honest |
---|
0:23:24 | i think |
---|
0:23:26 | i mean v but it is necessary i would say like a particularly looking experimented |
---|
0:23:30 | with it is this data set i found that the a lot of errors in |
---|
0:23:33 | this especially with respect to dialog state annotations so if you're just trying to improve |
---|
0:23:38 | upon this it's not a good idea because we won't even over that we are |
---|
0:23:41 | doing better not a so they are these a new dataset dstc a that we |
---|
0:23:45 | can look into and c |
---|
0:23:47 | for approaches are do better but otherwise i mean i feel like now people have |
---|
0:23:52 | begin to do more into n approaches where you don't even need the state it's |
---|
0:23:55 | more implicit but then that's eigenvoice under the same problem to pipeline or not to |
---|
0:24:00 | pipeline so |
---|
0:24:01 | i don't good answers |
---|
0:24:05 | and user questions |
---|
0:24:08 | i have one question so have you can see that the wasteful evaluation i i'm |
---|
0:24:14 | not sure if the carryover ease the you know closing some problem in the evaluation |
---|
0:24:18 | if we can be so previous slot values a circle back propagating areas to the |
---|
0:24:25 | next ones but if you if you sort of the |
---|
0:24:28 | have another metric that like a soft update rate or something like that is the |
---|
0:24:33 | be possible for you to evaluate you missus more accurately |
---|
0:24:37 | a slot will be treated like |
---|
0:24:41 | also |
---|
0:24:42 | i see a point |
---|
0:24:44 | so the numbers i think get for the |
---|
0:24:48 | so this some of the seventy six percent is more like |
---|
0:24:51 | each i don't level accuracy for a particular done if the carrier model predict everything |
---|
0:24:55 | gradient using more like |
---|
0:24:57 | better not be updated |
---|
0:24:59 | like more like precision and recall for either that be better exactly the eigen put |
---|
0:25:03 | it here but also here like this these twitter data rate |
---|
0:25:07 | you can think about it is the first one is more like a precision it |
---|
0:25:10 | will for the slot a big model for the carrier would like this thing about |
---|
0:25:13 | that big model so in this case the model predicts that we should update |
---|
0:25:17 | but the grounded is not a base so this is like a precision and the |
---|
0:25:20 | second is more can you what it |
---|
0:25:22 | statistic is more likely correlated |
---|
0:25:23 | so i don't know the numbers this morning at t and eighty four percent number |
---|
0:25:27 | to this more destructive actually somewhat inflated i is more meaningful looking down level because |
---|
0:25:32 | it won't all the |
---|
0:25:34 | starts to be getting because eventual goal is to do joint goal accuracy when you |
---|
0:25:37 | want all the slots to be correctly predicted |
---|
0:25:39 | okay and we did we didn't train our models so an important |
---|
0:25:43 | thing is also that train these the caddy or model jointly and not |
---|
0:25:48 | well as log and this is important because if you do per slot the we |
---|
0:25:51 | don't we try to the meeting good performance because of a one like |
---|
0:25:55 | the dataset that lead up examples particularly for the cable model is highly biased you |
---|
0:26:00 | can imagine like the number of bits are very few most of the time distorted |
---|
0:26:03 | just getting at either one so if you just trained directly you would won't have |
---|
0:26:06 | anything but signals are two for the updates and you will get just biased |
---|
0:26:11 | the training |
---|
0:26:14 | so it it's about time so it's not to speak again |
---|