0:00:18 | okay so the last |
---|
0:00:21 | speaker in this session is play issue |
---|
0:00:26 | and the she's going to present a flexibly structured models for task oriented dialogues so |
---|
0:00:32 | another end-to-end dialog model |
---|
0:00:35 | so |
---|
0:00:37 | go ahead trees |
---|
0:01:07 | and you're not everyone on relation for university of illinois at chicago our present our |
---|
0:01:13 | work flexible structured task oriented dialogue modeling for short addressed the |
---|
0:01:19 | this work at all my me pair molly no multi |
---|
0:01:22 | who shoe being deal |
---|
0:01:24 | why children and spoken for |
---|
0:01:28 | lattice quick reply recap module it end-to-end dialog systems |
---|
0:01:33 | traditional modularised dialogue system at the pipeline of natural language understanding dialog state tracking knowledge |
---|
0:01:40 | base squarey |
---|
0:01:42 | that a dialogue policy engine and natural language generation |
---|
0:01:47 | and you and that of system connect all these motors together and the chain them |
---|
0:01:51 | together with detecting and text out |
---|
0:01:54 | the advantage of and you and fashion you that it can reduce the error propagation |
---|
0:02:02 | dialog state tracking the key module which understanding user intentions |
---|
0:02:08 | track dialog history and update dialog state at every turn |
---|
0:02:13 | the update of dialogue state get used for carrying the knowledge base and a for |
---|
0:02:17 | policy engine and for response generations |
---|
0:02:20 | there are two popular approaches week or them fully structured approach and a freeform approach |
---|
0:02:30 | the following doctrine approach uses the full structure of the knowledge base |
---|
0:02:35 | both it's schema |
---|
0:02:37 | and that the values |
---|
0:02:39 | it as you don't that |
---|
0:02:41 | the set of informable slot values and the requestable slots are fixed |
---|
0:02:47 | the network about it's multiclass classification |
---|
0:02:51 | the advantages that value and the slot are well aligned |
---|
0:02:55 | the disadvantage in that it can not adapted dynamic knowledge base and detect out-ofvocabulary values |
---|
0:03:03 | appeared user's utterance |
---|
0:03:10 | the freefall approach does not exploit and information |
---|
0:03:14 | a pause the knowledge base |
---|
0:03:16 | in the model architecture |
---|
0:03:18 | it achieves the dialog state as a sequence of informal values and the requestable slots |
---|
0:03:25 | for example in the picture |
---|
0:03:27 | in the restaurant domain |
---|
0:03:29 | that dialog state it's |
---|
0:03:30 | italian then we call an cheap then we call them |
---|
0:03:34 | address then we call an and a full |
---|
0:03:37 | the network it's sequences sequence |
---|
0:03:40 | the pros i that |
---|
0:03:42 | it can adapt to new domains |
---|
0:03:44 | and that the changes in the content of knowledge base |
---|
0:03:48 | it is stopped out-of-vocabulary problems |
---|
0:03:50 | the disadvantage is that |
---|
0:03:53 | value and the slot |
---|
0:03:54 | and not aligned |
---|
0:03:56 | for example |
---|
0:03:57 | in travel booking system |
---|
0:03:59 | given a |
---|
0:04:00 | dialog state chicago and that's the other |
---|
0:04:03 | can you tell |
---|
0:04:04 | what you that departure city and the which when it's a rival city |
---|
0:04:09 | and also |
---|
0:04:10 | tough free from approach which model unwanted order of requestable slots and it can produce |
---|
0:04:16 | in many states |
---|
0:04:18 | that may be generated and non requestable slot words |
---|
0:04:24 | so our proposed yet |
---|
0:04:26 | flexible structured dialogue models |
---|
0:04:29 | the contents fine components |
---|
0:04:31 | the first it the queen hard |
---|
0:04:33 | the queen hardly at all we encoded in core encoder module |
---|
0:04:37 | and the yellow and orange part of our dialog state tracking |
---|
0:04:41 | the purple part of its knowledge base query |
---|
0:04:45 | the red part it's all a new module we propose yet call the response lot |
---|
0:04:49 | decoders |
---|
0:04:50 | and the green and of the we and that the blue part well together would |
---|
0:04:54 | be the response generation |
---|
0:04:58 | so we propose a flexible subject turn dialog state tracking |
---|
0:05:03 | approach |
---|
0:05:04 | what you use only the information in the schema |
---|
0:05:08 | of the knowledge base but not to use the information about the values |
---|
0:05:13 | the architecture we propose contains two parts |
---|
0:05:18 | informable slot value decoder the yellow in this pictures |
---|
0:05:22 | and the requestable slot decoder the already part |
---|
0:05:26 | the informable slot value decoder has separate decoder to each informable slot |
---|
0:05:32 | for example in this picture |
---|
0:05:36 | what is for that right |
---|
0:05:37 | given the start of standard token foot |
---|
0:05:40 | the decoder generate italian and of food |
---|
0:05:45 | for the requestable slot decoder idiot a multi-label classifier for requestable slots |
---|
0:05:50 | or you can think that |
---|
0:05:53 | binary classification given a requestable slot |
---|
0:05:57 | you can see that inflexible structured approach has a lot of advantage first slot and |
---|
0:06:04 | the values are aligned |
---|
0:06:06 | it also solves all the vocabulary problem |
---|
0:06:09 | and the k usually at that between your domains and of the changes of the |
---|
0:06:12 | content of knowledge base because we are using a generation method for the informable value |
---|
0:06:18 | decoder |
---|
0:06:19 | and also we remove the unwanted order of the requestable slots and that the channel |
---|
0:06:24 | to generate invalid the states |
---|
0:06:29 | a nice the flexible subject read dialog state tracking it's |
---|
0:06:33 | it can explicitly |
---|
0:06:35 | a design value to slots |
---|
0:06:38 | like the fully structured approach |
---|
0:06:40 | why are also preserving the capability of dealing with all the vocabulary |
---|
0:06:45 | like the freefall approach |
---|
0:06:47 | meanwhile it ring challenges in response generation |
---|
0:06:52 | the first challenge is that |
---|
0:06:54 | the it possible to improve the response generation quality based i'll flexible structured dst |
---|
0:07:01 | the second challenge is that |
---|
0:07:04 | how to incorporate the output for a flexible subject or dst |
---|
0:07:08 | for response generation |
---|
0:07:12 | so regarding the first challenge |
---|
0:07:14 | how to improve the response generation we propose a novel module called the response large |
---|
0:07:20 | decoder |
---|
0:07:21 | the writing to pick the right part in the pictures |
---|
0:07:25 | the response slots |
---|
0:07:27 | decoders |
---|
0:07:28 | of the response slots |
---|
0:07:29 | i don't slot names or the slot tokens |
---|
0:07:32 | that appear in that you lexicalised the response |
---|
0:07:35 | for example |
---|
0:07:36 | the user request the address |
---|
0:07:39 | the system replies |
---|
0:07:40 | the address |
---|
0:07:41 | often am slot |
---|
0:07:43 | it in i just thought |
---|
0:07:45 | so for the response lot colder we also adopt a multi-label classifier |
---|
0:07:52 | regarding the stacking the challenge |
---|
0:07:54 | how to incorporate |
---|
0:07:56 | flexible subject or |
---|
0:07:57 | the st |
---|
0:07:59 | for the rest both generations |
---|
0:08:01 | we propose toward a copy distributions |
---|
0:08:04 | it will increase the chance of awards |
---|
0:08:07 | in the informable slot values |
---|
0:08:10 | requestable slots and the response lot to appear in the agent response |
---|
0:08:15 | for example |
---|
0:08:17 | the address of an m slot get e |
---|
0:08:20 | i had just a lot so we are trying to increase the channels off |
---|
0:08:23 | address |
---|
0:08:25 | name slot and at a slot to appear in the response |
---|
0:08:31 | it'll from now i'm going to go to detail how we link these modules together |
---|
0:08:39 | first it always input encoders |
---|
0:08:42 | i like input encoder |
---|
0:08:44 | takes so you kind of input |
---|
0:08:46 | the first get agents right well in the pastor |
---|
0:08:50 | the second it that dialog state |
---|
0:08:53 | and this sort yet the current the user's utterance |
---|
0:08:56 | the out the were p |
---|
0:08:58 | the last hidden state of the encoder |
---|
0:09:01 | it was first asked initial hidden state |
---|
0:09:04 | what the dialog state tracker and that the response generation |
---|
0:09:12 | informable slot about a decoder gets one part of our flexible structure dst |
---|
0:09:18 | it has to kind of input |
---|
0:09:21 | the input e at last the hidden states from the encoders |
---|
0:09:25 | and that the unique start of sentence syllables for each slot |
---|
0:09:29 | for example |
---|
0:09:31 | for the slot starting word gets food |
---|
0:09:34 | the output |
---|
0:09:35 | for each slot |
---|
0:09:37 | a sequence of words regarding the slot values are generated |
---|
0:09:41 | for example |
---|
0:09:43 | the value generated of all for the slot here |
---|
0:09:45 | italian |
---|
0:09:46 | and awful |
---|
0:09:48 | the intuition here is that |
---|
0:09:50 | the unique start of sentencing both issuers |
---|
0:09:54 | the slot and the value alignment |
---|
0:09:56 | and that the complement can it then a command sequences sequence allows copying of values |
---|
0:10:01 | directly from the encoder input |
---|
0:10:05 | the requestable slot binary classifier |
---|
0:10:08 | this is the another part in our d |
---|
0:10:10 | flexible structure to dst |
---|
0:10:13 | the you what is that |
---|
0:10:14 | last hidden state of the encoder |
---|
0:10:17 | unique start of send the symbols for each slot |
---|
0:10:20 | for example |
---|
0:10:22 | for the slot starting a war it also for |
---|
0:10:25 | the also forty it's |
---|
0:10:26 | for each slot |
---|
0:10:28 | a binary prediction |
---|
0:10:29 | true or false |
---|
0:10:31 | the produced regarding whether the slot it is requested by the user or not |
---|
0:10:38 | note that |
---|
0:10:39 | but you are you here i guess only one step |
---|
0:10:42 | it may be replaced that with any classification high key picture you want like |
---|
0:10:47 | which uses you are good because we want to use the hidden state here |
---|
0:10:50 | at the initial state for our response slot binary classifier |
---|
0:10:57 | what the knowledge base acquire a get takes the in the generated informable slot values |
---|
0:11:02 | and of the knowledge base and output |
---|
0:11:05 | well how the vector represents the number of record the matched |
---|
0:11:12 | he i get our response slot binary classifier |
---|
0:11:16 | if the input es |
---|
0:11:17 | the knowledge base par with a result |
---|
0:11:20 | the hidden state from the requestable slot binary classifier |
---|
0:11:25 | output yet |
---|
0:11:26 | for each response plot a binary prediction |
---|
0:11:29 | true or false |
---|
0:11:30 | if the produced regarding whether it is response not appear in the asian the response |
---|
0:11:36 | or not |
---|
0:11:38 | the motivation is that |
---|
0:11:39 | incorporating all it really relevant information about the retrieved entities |
---|
0:11:45 | and that the requested slots into the response |
---|
0:11:52 | our |
---|
0:11:52 | copy what a word a copy distribution can use them |
---|
0:11:56 | the motivation here is that |
---|
0:11:58 | the canonical copy |
---|
0:12:00 | mechanic then only takes a sequence of words in text input |
---|
0:12:05 | but not accept |
---|
0:12:06 | the multi porno distribution we obtain |
---|
0:12:09 | from the binary classifiers |
---|
0:12:12 | so we taking |
---|
0:12:14 | the prediction from the informable slot the value decoders |
---|
0:12:18 | and that from the requestable slot binary classifier and the response slot binary classifier |
---|
0:12:25 | and output a word distribution |
---|
0:12:27 | so |
---|
0:12:28 | if a word yet a requestable slot or a response not |
---|
0:12:33 | the probability of the a binary classifier output |
---|
0:12:37 | if a word appears in the generated informable slot values |
---|
0:12:42 | if the probability equal to one |
---|
0:12:45 | four or other words in there |
---|
0:12:53 | a interest about decoder |
---|
0:12:55 | what taking that encode |
---|
0:12:56 | the last hidden state of the encoders |
---|
0:12:59 | and the knowledge base carried a result |
---|
0:13:01 | and that the word a copy distributions |
---|
0:13:04 | all support get a delexicalised agent response |
---|
0:13:08 | the overall loss for the whole network what including the informable slot values |
---|
0:13:14 | so loss and of the requestable slot values last response slot values most and that |
---|
0:13:20 | the agent a response slot values but a gender is the boss loss |
---|
0:13:27 | experimental settings |
---|
0:13:28 | we use to kind of the that |
---|
0:13:31 | the cambridge restaurant dataset and the stand for in-car assistant there is that |
---|
0:13:35 | and the evaluation matches we use |
---|
0:13:38 | for the dialog state tracking we report the |
---|
0:13:41 | we report the precision recall and f-score four informable slot values and requestable smarts |
---|
0:13:47 | and of what have completion |
---|
0:13:49 | we use the and you match rate and the success f one score |
---|
0:13:54 | and the blue yet apply to degenerated agent response for evaluating the language quality |
---|
0:14:02 | we compare our method to these baselines |
---|
0:14:05 | and em |
---|
0:14:06 | and id and their functional ones |
---|
0:14:09 | they using the fully structured approach what |
---|
0:14:12 | for the dialog state tracking is |
---|
0:14:14 | and the kb are in from the stand for |
---|
0:14:16 | they do not think that they do not do that dialog state tracking |
---|
0:14:19 | and that est p |
---|
0:14:21 | and the t st p |
---|
0:14:23 | without are your and ts tp the other freefall approaches |
---|
0:14:27 | they use a two-stage copy and could be mccain didn't sequence of sequence |
---|
0:14:31 | which kaldi software encoders and the true copy mechanic simple commanded decoders |
---|
0:14:36 | to decode belief state first and then the response generation as |
---|
0:14:40 | and of for the for its ep and also tuning |
---|
0:14:46 | the response slot by the reinforcement learning |
---|
0:14:51 | here the turn dialogue dialog state tracking results |
---|
0:14:54 | you are notice that |
---|
0:14:56 | our proposed the method fst in it performs much better than the free for approach |
---|
0:15:01 | jesse p especially |
---|
0:15:04 | the |
---|
0:15:04 | especially on the requestable slot the reason is that |
---|
0:15:08 | the free for approach that modeled the unwanted order |
---|
0:15:12 | of the requestable slots |
---|
0:15:14 | so that why hall or of f is the uncanny can perform better than them |
---|
0:15:22 | this it our that of the level task completion without |
---|
0:15:26 | you also notice that fst and can perform better than most |
---|
0:15:30 | better than the baseline in models to match |
---|
0:15:33 | you most the metrics that the blue on the kb it dataset |
---|
0:15:39 | here it example of generated dialog state and the response from the free for approach |
---|
0:15:45 | and all approach |
---|
0:15:46 | in the calendars |
---|
0:16:09 | okay |
---|
0:16:10 | the belief state at the want to choose a belief that here is that for |
---|
0:16:14 | the informable slot the you've and easy crow to the meeting and for the requestable |
---|
0:16:18 | slot the user try to be acquired state |
---|
0:16:20 | time and parity |
---|
0:16:22 | the freefall approach it would generate meeting data and a party an ofdm would generate |
---|
0:16:27 | the you've and the crow to them at a meeting data it to time it |
---|
0:16:31 | to an a party it's true |
---|
0:16:33 | you a notice that here the free for approach cannot generate the time |
---|
0:16:38 | the time here the really that in the training dataset |
---|
0:16:42 | the down a lot of example |
---|
0:16:44 | contain data in the parties so they modeled disc the free one approach you model |
---|
0:16:49 | it is kind of orders |
---|
0:16:51 | so the mammoth right data in party together so when during the testing |
---|
0:16:56 | the it |
---|
0:16:57 | what during the testing if the user request that date time party it cannot predict |
---|
0:17:02 | that the it cannot predict about the problem |
---|
0:17:05 | and also for that |
---|
0:17:06 | begin the response |
---|
0:17:08 | the |
---|
0:17:09 | one shows the it's your anatomy of the way it's |
---|
0:17:12 | parties slot on that there is not a time slot the t a cp generate |
---|
0:17:17 | the next meeting at that time slot on days not and the time slot and |
---|
0:17:21 | i'll have sdm can generate |
---|
0:17:24 | and maybe |
---|
0:17:25 | a baseline at a time slot with part is not here the freedom approach can |
---|
0:17:30 | generate system with the us and repeating this at the time slot |
---|
0:17:37 | the conclusion here that we propose an island to an architecture with a flexible structure |
---|
0:17:42 | model |
---|
0:17:44 | for the task oriented dialogues |
---|
0:17:46 | and the experiment |
---|
0:17:47 | suggest that the architecture get competitive with these us assume top models and the wire |
---|
0:17:53 | our model can be apply applicable you real world scenarios |
---|
0:17:57 | our code will be available in the next few weeks on this links |
---|
0:18:05 | and is it another when you work regarding the model be multi action policy what |
---|
0:18:09 | task oriented dialogs it will appear mlp tucson it |
---|
0:18:14 | the pre and the code are publicly accessible on this link all you can see |
---|
0:18:17 | again the cure a cold |
---|
0:18:19 | the traditional policy engine predicts what action per term which were limited express upon work |
---|
0:18:24 | and introduce unwanted terms but interactions |
---|
0:18:28 | so |
---|
0:18:28 | we propose to generate monte action per turn by generating a sequence of tuple the |
---|
0:18:34 | tuple units continue act and the smart |
---|
0:18:37 | the continue here means well that we are going to stop generating just tuples all |
---|
0:18:42 | we are going to continue to generate the couples the slot to me the accuracy |
---|
0:18:47 | of the dialogue act and the slots media a does not carry it's the it's |
---|
0:18:51 | not like a movie name |
---|
0:18:53 | we propose a novel recurrent zero |
---|
0:18:56 | called the data continues that's not g c is |
---|
0:18:59 | which contains two units |
---|
0:19:01 | continue you need act you need and the smallest unit |
---|
0:19:05 | and it sequentially-connected in this recurrent is there |
---|
0:19:09 | so the whole decoder yet in a recurrent of recurrent a fashions |
---|
0:19:15 | we would like to deliver a special thanks to alex janice woman maps and the |
---|
0:19:20 | stick their reviewers thank you |
---|
0:19:27 | thank you very much for the talk |
---|
0:19:29 | so are there any questions okay or in the back |
---|
0:19:40 | i thank you very much that was very interesting |
---|
0:19:43 | so what the system do if somebody didn't respond with a slot name or a |
---|
0:19:48 | slot value |
---|
0:19:49 | you know what time you what restaurant you want you that it is that the |
---|
0:19:52 | closest one to the form theatre |
---|
0:19:59 | excuse me to repeat the lessons again |
---|
0:20:03 | your system prompts somebody for a restaurant where they want to eat you money that |
---|
0:20:07 | some italian food the system says what restaurant would you like to eat at and |
---|
0:20:12 | the user says the closest italian restaurant to the form theatre |
---|
0:20:16 | so i'm not giving you a slot value i'm giving you a constraint on the |
---|
0:20:20 | slot value |
---|
0:20:21 | what this kind of an architecture do with something like that is a response okay |
---|
0:20:25 | thank you a generate a |
---|
0:20:27 | that |
---|
0:20:28 | does not the menus provided user to what we are working for most of the |
---|
0:20:32 | values were detected |
---|
0:20:34 | so when we gent |
---|
0:20:35 | the always informable slot value decoder |
---|
0:20:45 | informable slot the melody currently decoder were trying to catch these the end use these |
---|
0:20:49 | informations from the user side so when we are trying to generated is kind of |
---|
0:20:53 | each things we are also well using the copy |
---|
0:20:56 | we also are trying to increase these words to be appeared in the response generation |
---|
0:21:01 | is for example |
---|
0:21:02 | the titanium at the italian restaurant or you want to what b |
---|
0:21:05 | a someplace this method that |
---|
0:21:08 | i understand how you do that but the question is how would you get the |
---|
0:21:11 | act to what the wrapper internal representation be somehow that we get the closest to |
---|
0:21:16 | get the superlative |
---|
0:21:18 | in the result how what if compute the closest of all you're doing is attending |
---|
0:21:22 | to values that you have to compute some function like instance |
---|
0:21:27 | actually at the very but the question i think that |
---|
0:21:31 | it is them i'm getting you are trying to ask you whether if the |
---|
0:21:36 | have to informable slot the values from the user the is not exactly match is |
---|
0:21:40 | something that appear in the knowledge base |
---|
0:21:42 | it is that strike not trying to i'm saying the user doesn't know what's in |
---|
0:21:46 | the knowledge base it's just saying whatever is the closest one you tell me |
---|
0:21:50 | okay the closest the one for example you get it will also be something like |
---|
0:21:56 | it will be something like for the area slot values actually this kind of situation |
---|
0:22:01 | our current a model cannot handle and or on the past work cannot handle because |
---|
0:22:05 | it and it is not actually appeared in the dataset we are using |
---|
0:22:09 | right thank you |
---|
0:22:14 | any other questions |
---|
0:22:19 | okay in that case i'd like to the collection |
---|
0:22:23 | i notice that you were evaluating your model on two datasets the cambridge restaurant and |
---|
0:22:28 | the key v read and i was wondering with wouldn't be or how difficult would |
---|
0:22:33 | it be to extend them all to work on the multi walls dataset which is |
---|
0:22:37 | you know bigger than those two and as more domains and |
---|
0:22:41 | actually the very good questions |
---|
0:22:45 | actually in the |
---|
0:22:48 | in the for the for the most you want us that being the latest ecr |
---|
0:22:52 | conference to trader network that is trying to do it |
---|
0:22:57 | then updated it into do that they were showed that of the cherokee use the |
---|
0:23:01 | system in a kind of all a similar kind of techniques |
---|
0:23:04 | using different as that of sentence si models |
---|
0:23:07 | two different start of than the steamboat to generated are the values |
---|
0:23:11 | so i think that so we did a the our work kind of kind of |
---|
0:23:16 | prove that |
---|
0:23:17 | that's flexible started at the phonetic symbols structured the entity can be applied on the |
---|
0:23:22 | multi award part |
---|
0:23:23 | and the for the response generation part we believe that of the we believe that |
---|
0:23:28 | our proposed the copy word like anything can also work |
---|
0:23:31 | okay so basically you think that just retraining should it's should be sufficient i think |
---|
0:23:36 | okay thanks okay any other question |
---|
0:23:42 | it then i guess i have one more |
---|
0:23:46 | and there was |
---|
0:23:51 | basically when you when you were showing the us a lot response model or responsible |
---|
0:23:59 | decoder that was the |
---|
0:24:02 | i mean |
---|
0:24:03 | and i and you said that you have like once the gru |
---|
0:24:11 | ones that what as it exactly mean or weight is there like and one gru |
---|
0:24:15 | cell that is |
---|
0:24:17 | yes with a good |
---|
0:24:18 | kind of using the gru zero but we do not using it the recurrent a |
---|
0:24:22 | later |
---|
0:24:23 | right and the output is like a |
---|
0:24:25 | one hearts encoding of the slots to be inserted in the response or is it |
---|
0:24:32 | some kind of embedding |
---|
0:24:36 | here it's a it depending on but also put his to the whole body it |
---|
0:24:40 | sure for small we can thing yet |
---|
0:24:42 | distribution from that there'll where right okay so that's why or what a couple or |
---|
0:24:48 | what a copy distribution what using this kind of zero to one values and the |
---|
0:24:52 | probability that we decide whether this |
---|
0:24:54 | the to increase this words channels appear in the in the agent response |
---|
0:24:58 | right okay thank you very much thank you |
---|
0:25:02 | alright so what's thank the speaker again |
---|