0:00:17 | and the next |
---|
0:00:20 | speaker we have is she mary |
---|
0:00:24 | with the paper on structured fusion networks for dialogue which use an end-to-end dialog model |
---|
0:00:31 | so |
---|
0:00:32 | please |
---|
0:00:49 | emission key |
---|
0:00:51 | and i'm here today to talking about structured fusion networks for dialogue |
---|
0:00:55 | this work was done with |
---|
0:00:57 | to just ring a bus and my adviser maxine eskenazi |
---|
0:01:01 | okay let's talk about neural models of dialogue |
---|
0:01:04 | so neural dialogue systems do really well on the task of dialog generation |
---|
0:01:08 | but they have several well-known shortcomings |
---|
0:01:11 | they need a lot of data to train |
---|
0:01:13 | they struggle to generalize to new domains |
---|
0:01:17 | there are difficult to control |
---|
0:01:19 | and |
---|
0:01:20 | they exhibit divergent behavior one tune with reinforcement learning |
---|
0:01:25 | on the other hand traditional pipelined dialogue systems |
---|
0:01:28 | have structure components |
---|
0:01:30 | that allow us to easily generalize them |
---|
0:01:34 | interpret them and control these systems |
---|
0:01:37 | both these systems have their respective advantages and disadvantages |
---|
0:01:41 | neural dialogue systems can learn from data |
---|
0:01:43 | and they can learn a higher level reasoning |
---|
0:01:46 | we're higher level policy |
---|
0:01:47 | on the other hand pipeline systems |
---|
0:01:49 | are very structured nature which has several benefits |
---|
0:01:53 | yesterday there was this question in the panel of |
---|
0:01:55 | so pipeline or not to pipeline |
---|
0:01:57 | and to me the obvious answer seems why not both and i think that |
---|
0:02:03 | combining these two approaches is a very intuitive thing to do |
---|
0:02:07 | so how do we go about combining these two approaches |
---|
0:02:10 | so in powerpoint systems we have structure components so the very first thing to do |
---|
0:02:16 | to bring the structure |
---|
0:02:17 | to neural dialogue systems |
---|
0:02:19 | it's to and you like these components |
---|
0:02:22 | so using the multimodal dataset we first define and train |
---|
0:02:26 | several neural dialogue modules |
---|
0:02:28 | one for the nlu |
---|
0:02:29 | one for the dm and one for the nlg |
---|
0:02:33 | so for the nlu what we do is |
---|
0:02:35 | we read the dialogue context |
---|
0:02:38 | encoded and then |
---|
0:02:39 | ultimately make a prediction about the belief state |
---|
0:02:43 | for the dialogue manager |
---|
0:02:44 | we look at the belief state as well as some vectorized representation of the database |
---|
0:02:48 | output passage are several in your layers and ultimately predict the system dialogue act |
---|
0:02:55 | for the nlg we have a condition language model |
---|
0:02:58 | where the initial hidden state is a linear combination |
---|
0:03:01 | of the dialogue act the belief state and the database vector and then at every |
---|
0:03:05 | time step |
---|
0:03:06 | the model outputs what the next word should be to ultimately generate the response |
---|
0:03:11 | so we have these three neural dialogue modules |
---|
0:03:14 | that i merely is structured components of traditional pipelined systems |
---|
0:03:18 | given these three components |
---|
0:03:21 | how do we actually go about |
---|
0:03:22 | building a system for dialog generation |
---|
0:03:25 | well the simplest thing to do is |
---|
0:03:28 | now you fusion |
---|
0:03:29 | where what we do is we train these systems and then we just combine the |
---|
0:03:34 | naively during inference where instead of passing in the ground truth belief state of the |
---|
0:03:40 | dialogue manager which is what we would do during training we make a prediction |
---|
0:03:44 | using our trained nlu |
---|
0:03:46 | and then pass it into the dialogue manager |
---|
0:03:50 | another way of using these dialogue modules |
---|
0:03:53 | after training them independently is multitasking |
---|
0:03:57 | so |
---|
0:03:58 | which simultaneously learn the dialogue modules |
---|
0:04:01 | as well as the final task of dialog response generation so we have these three |
---|
0:04:06 | independent modules here |
---|
0:04:07 | and then we have these red arrows that correspond to the forward propagation |
---|
0:04:11 | for the task of response generation |
---|
0:04:15 | sharing these the parameters in this way result in more structured components |
---|
0:04:19 | now the encoder |
---|
0:04:20 | is both being used for the task of the nlu |
---|
0:04:23 | as well as for the task of response generation |
---|
0:04:25 | so now would have this notion of structure in it |
---|
0:04:29 | another way which is the primary |
---|
0:04:32 | novel work in our paper is structured fusion networks |
---|
0:04:35 | structured fusion that works aim to learn a higher level model |
---|
0:04:39 | on top of free train neural dialogue modules |
---|
0:04:43 | here's a visualization of structured fusion networks |
---|
0:04:45 | and don't worry if the seems like spaghetti a come back to this |
---|
0:04:49 | so here what we have is |
---|
0:04:51 | we have the original dialogue modules the nlu the dm and all g |
---|
0:04:55 | in these grey small boxes in the middle |
---|
0:04:58 | and then what we do is we |
---|
0:04:59 | define these black boxes around them |
---|
0:05:02 | that consist of a higher level module |
---|
0:05:04 | so the nlu get upgraded to the and on you plots |
---|
0:05:07 | the dm to the dm plus and the nlg to the energy plus |
---|
0:05:11 | by doing this |
---|
0:05:12 | the higher level model does not need to relearn and remodel the dialogue structure |
---|
0:05:16 | because it's provided to it |
---|
0:05:18 | do the pre-trained dialogue modules |
---|
0:05:21 | instead the higher level model |
---|
0:05:23 | can focus on the necessary abstract modeling for the task of response generation |
---|
0:05:28 | which includes encoding complex natural language |
---|
0:05:31 | modeling the dialogue policy |
---|
0:05:33 | and generating language conditional some latent representation |
---|
0:05:37 | and they can leverage |
---|
0:05:38 | the already provided dialogue structure to do this |
---|
0:05:43 | so let's go through the structured fusion network piece by piece and see how we |
---|
0:05:47 | build it up |
---|
0:05:48 | we start out with these dialogue modules and great here |
---|
0:05:51 | the combination between them is exactly what you sign it fusion |
---|
0:05:56 | first we're gonna we're gonna add the nlu plus |
---|
0:05:59 | the nlu plus get the output it belief state |
---|
0:06:02 | and one it |
---|
0:06:03 | re encodes the dialogue context |
---|
0:06:05 | it has the already predicted belief state concatenated at every time step |
---|
0:06:10 | and in this way the encoder does not need to relearn the structure and can |
---|
0:06:14 | leverage the already computed belief state to better encode the |
---|
0:06:18 | the dialogue context |
---|
0:06:21 | next we're gonna add the dm plus |
---|
0:06:23 | and the dm plus |
---|
0:06:24 | initially |
---|
0:06:25 | it takes as input it concatenation of four different features |
---|
0:06:29 | the database vector the predicted dialogue act |
---|
0:06:32 | the predicted belief state |
---|
0:06:33 | and the final hidden state of the higher level encoder |
---|
0:06:36 | and then passes that the real when you're layer |
---|
0:06:39 | by providing the structure in this way it's our hope that |
---|
0:06:41 | this sort of serves of the pause you modeling components |
---|
0:06:44 | in this and send model |
---|
0:06:48 | the nlg plus |
---|
0:06:50 | takes as output takes as input the output of the dm plots and user that's |
---|
0:06:55 | initialize the hidden state and then interfaces with the nlg |
---|
0:06:59 | let's take a closer look into the nlg plus |
---|
0:07:03 | it relies on cold fusion |
---|
0:07:05 | so basically what this means is |
---|
0:07:07 | the nlg it condition language model gives us a sense of what the next word |
---|
0:07:12 | could be |
---|
0:07:14 | the decoder on the other hand |
---|
0:07:16 | is more |
---|
0:07:18 | is more so |
---|
0:07:19 | performing higher level reasoning |
---|
0:07:22 | and then |
---|
0:07:22 | we take the large it's the output from the nlg about what the next word |
---|
0:07:26 | could be as well as the hidden state from the decoder |
---|
0:07:29 | about the representation of what we should be generating and combine them using cold fusion |
---|
0:07:36 | and then there's a cyclical relationship between the and all g and the higher level |
---|
0:07:40 | decoder |
---|
0:07:41 | in the sense that one cold fusion predicts what the next word should be three |
---|
0:07:44 | combination of the decoder nlg it passes that prediction both into the decoder |
---|
0:07:49 | and it to the next time step of the nlg |
---|
0:07:53 | and here's the final combination again which |
---|
0:07:56 | hopefully should make more sense |
---|
0:08:00 | so how do we train the structure fusion network |
---|
0:08:02 | because we have these modules this three different ways that we can do it |
---|
0:08:06 | the first one is that we can freeze these modules |
---|
0:08:08 | we can freeze the modules for obvious in their pre-trained |
---|
0:08:12 | and then just learn the higher level model on top |
---|
0:08:15 | in other ways that we can fine tune these modules for the final task of |
---|
0:08:19 | dialog response generation |
---|
0:08:21 | and then of course we can multitask the modules where we |
---|
0:08:24 | simultaneously fine tune them for response generation and for their original tasks |
---|
0:08:30 | we use the multi was dataset and generally follow their experimental setup |
---|
0:08:34 | which means the same hyper parameters and because they use the ground truth belief state |
---|
0:08:38 | we do so as well |
---|
0:08:39 | and you can sort of think with this as the oracle and all you in |
---|
0:08:42 | our case |
---|
0:08:43 | for evaluation we use the same hyper parameters which includes bleu score |
---|
0:08:47 | inform rate which |
---|
0:08:49 | measures how often the system has provided the appropriate entities to the user |
---|
0:08:54 | and success rate which is how often the system |
---|
0:08:57 | answers all the attributes the user request |
---|
0:09:00 | and then we use a combined score which they propose as well |
---|
0:09:03 | which is blue plus the average of informant success rate |
---|
0:09:07 | so let's take a look at our results |
---|
0:09:09 | first our baseline so as you see here sadistic with attention does gets a combined |
---|
0:09:14 | score of about eighty three point three six |
---|
0:09:17 | next we an i fusion both zero shot which means that they're in the penalty |
---|
0:09:21 | pre-trained in just combine it inference |
---|
0:09:23 | and then we also finetune for |
---|
0:09:25 | the task response generation which just slightly better than the baseline |
---|
0:09:30 | multitasking does not do so well with sort of indicates that |
---|
0:09:33 | the loss functions may be pulling |
---|
0:09:35 | the weights in different directions |
---|
0:09:38 | structure fusion networks with frozen modules |
---|
0:09:41 | also do not do so well |
---|
0:09:43 | but as soon as we start fine tuning |
---|
0:09:45 | we get a significant improvement |
---|
0:09:47 | with strong improvements |
---|
0:09:49 | with slight improvements over these other models |
---|
0:09:51 | in bleu score and then very strong improvements in informant success rate |
---|
0:09:55 | and we observe |
---|
0:09:57 | somewhat patterns with s f and with multitasking |
---|
0:10:00 | and honestly the seems kind of |
---|
0:10:02 | intuitive when you think about it informally then success rate measure how often we inform |
---|
0:10:08 | the user of the appropriate entities and how often we provide the appropriate attributes |
---|
0:10:12 | and explicitly modeling the belief state explicitly modeling the system act |
---|
0:10:16 | should into italy help with this |
---|
0:10:18 | if for model is explicitly aware of |
---|
0:10:21 | what attributes the user has requested it's going to better provide that information to the |
---|
0:10:25 | user |
---|
0:10:29 | but of course i talked about several different problems |
---|
0:10:32 | with neural models so let's see a structured fusion networks did anything to those problems |
---|
0:10:37 | the first problem that i mentioned is the neural models are very data hungry |
---|
0:10:41 | and i think that the added structure sure result and lasted hungry models |
---|
0:10:45 | so we compare secrecy got the tension instructed fusion networks |
---|
0:10:48 | i one percent five percent ten percent and twenty five percent of the training data |
---|
0:10:53 | on the left you see the informer a graph and on the right you see |
---|
0:10:56 | the success rate graph |
---|
0:10:57 | and varying levels of percentage of data used |
---|
0:11:01 | so the inform rate |
---|
0:11:02 | right about thirty |
---|
0:11:04 | thirty percent inform rate with c |
---|
0:11:06 | and i fifty five |
---|
0:11:08 | with structured fusion networks |
---|
0:11:11 | of course there's different this difference is really big when were |
---|
0:11:14 | and very small amounts of data as in one percent |
---|
0:11:17 | and then it's lonely comes together |
---|
0:11:19 | as we increase the data |
---|
0:11:21 | what success rate word about twenty |
---|
0:11:25 | what structured fusion networks |
---|
0:11:27 | and fairly close to about like two or three percent |
---|
0:11:30 | with sixty six and one percent of the data |
---|
0:11:33 | so for extremely low data scenarios one percent which is about |
---|
0:11:36 | six hundred utterances |
---|
0:11:39 | we do |
---|
0:11:40 | really well what structured fusion networks |
---|
0:11:42 | and the difference |
---|
0:11:43 | remains that about like ten percent improvement across both metrics |
---|
0:11:49 | another problem dimension is domain generalisability |
---|
0:11:52 | the added structure should give us more generalisable models |
---|
0:11:55 | so what we do is we compare secrecy constructor fusion that works |
---|
0:11:59 | by training on two thousand out of domain |
---|
0:12:02 | dialogue examples |
---|
0:12:03 | and fifty in domain examples |
---|
0:12:05 | where in domain is restaurant and then we evaluate entirely on the restaurant domain |
---|
0:12:11 | and what we see here is we get a sizable improvement and the combined scored |
---|
0:12:15 | using structured fusion networks |
---|
0:12:17 | what stronger permits in six sets in four |
---|
0:12:19 | the blue a slightly lower but this drop matches roughly |
---|
0:12:23 | what we saw in when using all the data so i don't think it's a |
---|
0:12:27 | problem specific the generalisability |
---|
0:12:30 | the next problem and to me the most interesting one |
---|
0:12:33 | is divergent behavior with reinforcement learning |
---|
0:12:36 | training general "'em" dialogue models with reinforcement learning |
---|
0:12:39 | often results in divergent behavior |
---|
0:12:42 | and you generate output |
---|
0:12:44 | i'm sure that everybody here has seen the headlines where people claimed that face okay |
---|
0:12:48 | i shut down there bought after it start inventing its own language really what happened |
---|
0:12:53 | was it started outputting |
---|
0:12:56 | stuff that doesn't look like english because it loses the structure as soon as you |
---|
0:13:00 | trying to with a reinforcement learning |
---|
0:13:02 | so why does this happen |
---|
0:13:04 | my theory about why this happens is the notion of the implicit language model |
---|
0:13:09 | stack decoders have the issue of the implicit language model which basically means that the |
---|
0:13:13 | decoder simultaneously learns the false and strategy |
---|
0:13:16 | as well as model language |
---|
0:13:18 | and image captioning this is very well observed |
---|
0:13:21 | and it's observed that the implicit language model over one the decoder |
---|
0:13:25 | so basically what happens is |
---|
0:13:27 | if the decoder generates if the if the image model detect so there's a giraffe |
---|
0:13:32 | the model always output the giraffe standing in a field |
---|
0:13:36 | which is this even if the draft is not standing in a field just because |
---|
0:13:39 | that's what the language model has been |
---|
0:13:41 | trying to do |
---|
0:13:44 | in dialogue on the other hand this problem a slightly different in the sense that |
---|
0:13:48 | when we finetuned dialogue models with reinforcement learning |
---|
0:13:51 | raw optimising for the strategy |
---|
0:13:53 | and alternately causing it on learn the implicit language model |
---|
0:13:57 | so |
---|
0:13:59 | structured fusion networks have an explicit language model |
---|
0:14:02 | so maybe we don't have this problem |
---|
0:14:05 | so let's try structured fusion networks with reinforcement learning |
---|
0:14:09 | so for this we trained with supervised learning and then we freeze the dialogue modules |
---|
0:14:14 | and finetune only the higher level model with the reward inform rape a success rate |
---|
0:14:20 | so we're optimising the higher level model for some dialogue strategy |
---|
0:14:23 | well relying on the structure components |
---|
0:14:26 | to maintain the structured nature of the model |
---|
0:14:29 | and we compared to changing cells work a knuckle |
---|
0:14:33 | where he export a similar problem |
---|
0:14:35 | and what we seize we get |
---|
0:14:38 | less divergence and language |
---|
0:14:39 | and fairly similar informant success rate with the state-of-the-art combined score here |
---|
0:14:47 | so here all the results for all the models that we compared |
---|
0:14:50 | throughout this presentation |
---|
0:14:53 | we see that |
---|
0:14:54 | adding structure in general seems to help |
---|
0:14:57 | and we get a sizable improvement over our baseline |
---|
0:14:59 | and |
---|
0:15:00 | the model especially is robust to reinforcement learning |
---|
0:15:04 | of course given how fast this field moves |
---|
0:15:07 | well or paper was in reviews somebody be our results and we don't have state-of-the-art |
---|
0:15:11 | anymore |
---|
0:15:12 | but |
---|
0:15:12 | one of the core contributions of their work |
---|
0:15:15 | was improving dialogue act prediction |
---|
0:15:18 | and because structured fusion that works have this ability |
---|
0:15:22 | to leverage dialogue act predictions and an explicit component |
---|
0:15:27 | i think there's room for combination here |
---|
0:15:30 | so |
---|
0:15:31 | no dialogue paper is complete without human evaluation so what we did here was we |
---|
0:15:37 | as mechanical turk workers |
---|
0:15:39 | to read the dialogue context and rate responses on a scale of one to five |
---|
0:15:43 | on the notion of appropriateness |
---|
0:15:46 | and what we see here is that |
---|
0:15:48 | structured fusion networks with reinforcement learning |
---|
0:15:51 | r per for r rated slightly higher |
---|
0:15:54 | with |
---|
0:15:54 | ratings of four or more given |
---|
0:15:58 | more often suggest that all everything in bold are statistically significant |
---|
0:16:02 | of course we have a lot more room |
---|
0:16:04 | to improve before we beat the human ground truth but i think adding structure char |
---|
0:16:09 | models is the way to go |
---|
0:16:12 | thank you for your attention and the code is available here |
---|
0:16:20 | for talk |
---|
0:16:21 | so now we have |
---|
0:16:23 | actually |
---|
0:16:24 | quite some time for top questions so any questions |
---|
0:16:31 | that's a |
---|
0:16:32 | a very interesting work and looks promising but |
---|
0:16:37 | you have plans to extend the evaluation and looking at whether |
---|
0:16:42 | the system with your architecture can actually engage and dialogue rather than replicating dialogues |
---|
0:16:47 | the second question i think the structure should help was do that and maintain |
---|
0:16:52 | like not have the issue of when you start training models and evaluating models an |
---|
0:16:58 | adaptive manner usually what happens is the errors propagate and i think that |
---|
0:17:02 | the structure should make that less likely to happen |
---|
0:17:07 | we |
---|
0:17:08 | i think that something that we should definitely look into in the literature |
---|
0:17:11 | and just if you put up your comparative slides the first one compare i think |
---|
0:17:16 | you're to |
---|
0:17:18 | a quick to |
---|
0:17:20 | see the ranks to the other one as having the |
---|
0:17:23 | the preferred performance because blue i would say is not something that should be measured |
---|
0:17:29 | in this context it's |
---|
0:17:31 | they're doing much better than you in blue but it's completely irrelevant whether you give |
---|
0:17:34 | exactly the same words as the original or not |
---|
0:17:37 | and you're actually doing much better and success for that's true i like my general |
---|
0:17:42 | feeling having looked at the data lot is that |
---|
0:17:45 | for this type of task at least we just relatively well and i think in |
---|
0:17:48 | the original paper they did some correlation analysis with human judgement |
---|
0:17:52 | but i think like |
---|
0:17:53 | blue does not make like on its own will not measure quality of the system |
---|
0:17:57 | but more so what it's measuring is |
---|
0:18:00 | how structure the languages and how like |
---|
0:18:03 | you disagree |
---|
0:18:06 | okay that's fair i guess with multiple references maybe we can improve this |
---|
0:18:15 | i so you and this three components but do you and you said that but |
---|
0:18:21 | trained on what are they pre-trained and the second mation sorry during training do you |
---|
0:18:27 | also have intermediate supervision there or they finetuned and then fashion |
---|
0:18:34 | right okay good question |
---|
0:18:36 | i mean just go back to that's why |
---|
0:18:39 | so in the multi was data |
---|
0:18:42 | they |
---|
0:18:43 | they give us the belief state and they give us the system dialogue act |
---|
0:18:46 | so what we do for pre-training these components is |
---|
0:18:50 | the no use pre-trained to go from context of belief state |
---|
0:18:53 | the dm from police data dialogue act |
---|
0:18:56 | and the and all g from dialogue acts response |
---|
0:18:59 | for your second question |
---|
0:19:00 | we do explore one of in our multi test set up |
---|
0:19:04 | we do intermediate supervision but in the other two we don't |
---|
0:19:08 | so it seems to me that you too much more supervision then there are usual |
---|
0:19:13 | sequence the sequence model each would be the reason for better performance rather than |
---|
0:19:19 | different architecture no |
---|
0:19:21 | no alike i completely agree with a point i think this but i think |
---|
0:19:26 | a point of our paper is doing this additional supervision |
---|
0:19:29 | and adding the structure into the model it's something that's numbering something that people should |
---|
0:19:33 | be fair enough |
---|
0:19:33 | but i do you understand that |
---|
0:19:35 | it's not necessarily the architecture and its own that's doing better cool thinking |
---|
0:19:42 | t any other questions |
---|
0:19:46 | a great dog picked as much as looks promising so you talk a bit about |
---|
0:19:51 | generalisability about this issue divergence with rl but it didn't touch much on the other |
---|
0:19:57 | is you mentioned in the trade off at the beginning which was control ability and |
---|
0:20:02 | i'm wondering if you have some thoughts on that |
---|
0:20:05 | i guess some of the questions that come into my mind we design but models |
---|
0:20:08 | with respect to control is suppose i wanted to behave a little bit differently in |
---|
0:20:12 | one case is there anyway that this architecture can address that run the other way |
---|
0:20:17 | to look at it looks at ten dollars three best in improving one of these |
---|
0:20:21 | components can i do it in any on the way other than |
---|
0:20:25 | get more data like how does the architecture for something in that sense okay |
---|
0:20:30 | the that's a good question control ability isn't something that we looked at yet but |
---|
0:20:34 | it's definitely something that i do you want to look at in the future just |
---|
0:20:37 | because i think doing something as simple as |
---|
0:20:39 | adding rules on top of the dialogue manager |
---|
0:20:42 | to just change and say like i with this dialogue act instead of these conditions |
---|
0:20:45 | are met would work really well and the model does leverage those dialogue act and |
---|
0:20:50 | like i've seen back projections from the lower level model |
---|
0:20:54 | result in four outputs |
---|
0:20:57 | that's definitely something that we should look into in the future |
---|
0:21:01 | remote mean was the second thing is a |
---|
0:21:02 | the other part is there's architectures suitable for it to decompose ability of can invest |
---|
0:21:08 | more on one calm like |
---|
0:21:09 | there is no need to blame assignment in any sense better and does it you |
---|
0:21:12 | know |
---|
0:21:13 | so |
---|
0:21:15 | i in like |
---|
0:21:17 | i'm not entirely sure |
---|
0:21:18 | for when we look at the final task response generation |
---|
0:21:22 | but we do sort of have a sense just because of the intermediate supervision |
---|
0:21:26 | how well each of the respective lower level components are doing |
---|
0:21:29 | and what i can say that the and all you just really well |
---|
0:21:32 | the |
---|
0:21:33 | the natural language generation just pretty well |
---|
0:21:36 | the main thing that struggling is this |
---|
0:21:38 | this active going from police data dialogue act |
---|
0:21:41 | and i think that if i was to recommend a component |
---|
0:21:45 | based on just the this the pre supervision |
---|
0:21:49 | to improve it would be the dialogue manager |
---|
0:21:52 | but like blame assignment in general for the response generation task |
---|
0:21:56 | isn't something that |
---|
0:21:58 | i think is really easy with the current state of the model but i think |
---|
0:22:01 | things might be able to be done to further interpret the model |
---|
0:22:09 | anymore questions |
---|
0:22:14 | okay in that case i'll |
---|
0:22:18 | one of my own |
---|
0:22:21 | can you |
---|
0:22:23 | explain how exactly the you know what it what is it that the |
---|
0:22:28 | dm and the impostor pretty how does it look like is it is it's some |
---|
0:22:31 | kind of |
---|
0:22:33 | a like |
---|
0:22:36 | dialogue act and betting or is it's is it explicit use it like a one |
---|
0:22:41 | hot |
---|
0:22:42 | so |
---|
0:22:44 | so the you mean like the dialogue act vector or just i mean basically what |
---|
0:22:47 | when you look at the dm |
---|
0:22:51 | well this i guess these are two different thing when you look at the end |
---|
0:22:56 | the output is dialogue act right yes and the dm plus has something different so |
---|
0:23:02 | like okay |
---|
0:23:03 | so for the dm itself because of the supervision |
---|
0:23:07 | we're predicting the dialogue act which is a multi class label |
---|
0:23:10 | and it's basically just one zeros |
---|
0:23:13 | like a binary vector okay and that's like |
---|
0:23:17 | in form a request that a single slot yellow inform restaurant available type thing right |
---|
0:23:25 | but then for the dm plus |
---|
0:23:28 | it's not a structured in that sense and basically what we do is |
---|
0:23:32 | we just treated as a linear layer that initialises |
---|
0:23:35 | the decoders hidden state and in the original multi what's paper they had this type |
---|
0:23:40 | of thing as well |
---|
0:23:41 | where they just had eight when you're layer between the encoder and decoder the combined |
---|
0:23:46 | more information into the hidden state |
---|
0:23:47 | and they call that the palsy and |
---|
0:23:50 | that sort of what where we're hoping that |
---|
0:23:52 | by adding the structure beforehand |
---|
0:23:54 | it's actually more like a policy rather than just to linger layer before |
---|
0:23:58 | right okay thank you into k what |
---|
0:24:01 | any more |
---|
0:24:04 | the last one |
---|
0:24:06 | did you try we have their baselines because she claims to sequence seems to be |
---|
0:24:10 | basic |
---|
0:24:12 | well we did try the other ways of combining the neural modules |
---|
0:24:15 | and then a fusion the multi tasking those ones |
---|
0:24:19 | i can go to that slide |
---|
0:24:21 | but we didn't write transformers or anything like that and i think that |
---|
0:24:24 | that's something that we can look into in the future |
---|
0:24:27 | but we tried like night fusion multitasking which are different which are baselines the we |
---|
0:24:32 | came up with |
---|
0:24:33 | for actually leveraging the structure as well |
---|
0:24:37 | okay thank you thank you |
---|