0:00:17 | but now we would listen to three papers that as i said underwent |
---|
0:00:22 | regular review process |
---|
0:00:25 | and the first one out is deep copy grounded response generation where the hierarchical pointer |
---|
0:00:31 | networks |
---|
0:00:33 | percent the by semi doubles |
---|
0:01:24 | hello everyone |
---|
0:01:26 | i'm say we use a phd student from university of california santa barbara |
---|
0:01:32 | today i'm going to talk about artwork one grounded response generation with hierarchical pointer networks |
---|
0:01:39 | this is a joint work with i've been have gone in and the lexical |
---|
0:01:46 | stay at different places now but so this is actually of work direct google they |
---|
0:01:51 | i while i was an intern |
---|
0:01:53 | last year |
---|
0:01:56 | okay without further ado let's start |
---|
0:02:01 | so |
---|
0:02:03 | this paper is about |
---|
0:02:05 | building dialogue models for |
---|
0:02:09 | you know knowledge |
---|
0:02:10 | ground the response generation |
---|
0:02:12 | and the problem that you want to tackle here is to |
---|
0:02:16 | a basically possible state models to |
---|
0:02:20 | kind of be able to do |
---|
0:02:22 | you know more natural are engaging |
---|
0:02:27 | you know compositions |
---|
0:02:28 | so basically like the previous |
---|
0:02:32 | papers in this domain has pointed |
---|
0:02:35 | several problems |
---|
0:02:37 | that actually sort of |
---|
0:02:39 | all the down to |
---|
0:02:42 | you know generic response generation of the models |
---|
0:02:45 | this is like sort of like the basic problem |
---|
0:02:48 | of you know that this paper is trying to tackle |
---|
0:02:51 | i just to start with an example so say we have a user out looking |
---|
0:02:56 | for a italian |
---|
0:02:58 | you know fourteen los altos |
---|
0:03:00 | and |
---|
0:03:01 | a response |
---|
0:03:02 | coming from a system like |
---|
0:03:04 | poppy's a nice restaurant in los altos serving italian food |
---|
0:03:08 | would be a good response but at the same time |
---|
0:03:11 | you know how engaging |
---|
0:03:14 | you know does this response sound |
---|
0:03:17 | i don't know i would probably prefer |
---|
0:03:19 | something that would it contain more information |
---|
0:03:23 | but in general basically this is sort of like the |
---|
0:03:26 | the scenario that we try to |
---|
0:03:29 | so if and reached responses with more information |
---|
0:03:34 | so and the question that the ask is that what happens if we were able |
---|
0:03:39 | to use |
---|
0:03:40 | an external knowledge to |
---|
0:03:43 | to make the content of these responses like more informative |
---|
0:03:47 | or more engaging if you wanna say |
---|
0:03:49 | so basically |
---|
0:03:51 | let's say we have a model |
---|
0:03:53 | that can actually go look at the commands of this restaurant |
---|
0:03:57 | that you actually want to sort of recommended user |
---|
0:04:01 | and then |
---|
0:04:01 | you know take a you know pieces so piece of information |
---|
0:04:05 | from these other reviews |
---|
0:04:07 | and then generate basically maybe like |
---|
0:04:09 | response that is looking like |
---|
0:04:11 | you know the first sentence this aim |
---|
0:04:13 | but it also says there are more chance but get the |
---|
0:04:16 | a carbon there are quite popular |
---|
0:04:20 | excuse me |
---|
0:04:22 | so this like would be sort of |
---|
0:04:24 | more engaging response to me |
---|
0:04:26 | so basically the general problem that we are going to be trying to so |
---|
0:04:32 | well in mall |
---|
0:04:34 | so proposing models |
---|
0:04:37 | to incorporate external knowledge in response to the next |
---|
0:04:43 | you know previous no previous work in this domain actually most of the early work |
---|
0:04:48 | try to not do this with you know sequence a sequence models |
---|
0:04:54 | it not exactly the same problem but are trying to kind of model that local |
---|
0:04:59 | we do not be using decks external knowledge |
---|
0:05:02 | so this sort of like requires |
---|
0:05:05 | you know a lot of data |
---|
0:05:07 | to be able to so encode a world's knowledge into you know them into actually |
---|
0:05:14 | the model's parameters |
---|
0:05:16 | and you know it some additional excerpt drawbacks also include like you know view you |
---|
0:05:22 | might actually depending movies the on the model you might need to retrain the model |
---|
0:05:26 | as a new knowledge becomes available and it's also |
---|
0:05:30 | instead of that like can be sort of think of this problem as |
---|
0:05:36 | incorporating like basic that adding the knowledge |
---|
0:05:39 | as an input to the model |
---|
0:05:41 | so there is like |
---|
0:05:45 | basically like there is an early work |
---|
0:05:47 | that |
---|
0:05:48 | tries to achieve this what they do is that they basically how decomposition is g |
---|
0:05:53 | and then they try to use you know |
---|
0:05:56 | additionally fax |
---|
0:05:58 | let's say like no external knowledge |
---|
0:06:00 | and then |
---|
0:06:01 | sort of pick some of the knowledge from |
---|
0:06:04 | this is this resource |
---|
0:06:05 | and so it incorporate that into response english |
---|
0:06:08 | so in this work we try to sort of go over the existing models that |
---|
0:06:13 | tries to achieve this |
---|
0:06:16 | exact scenario |
---|
0:06:17 | and then |
---|
0:06:19 | proposed for their models |
---|
0:06:21 | that weeding might be useful |
---|
0:06:26 | so basically like the d contributions that we talk about the going to be like |
---|
0:06:30 | sort of |
---|
0:06:31 | you know models that tries to incorporate external knowledge as an additional input |
---|
0:06:35 | and then so |
---|
0:06:37 | like more in more detail it will contain |
---|
0:06:40 | you know going or some baselines |
---|
0:06:42 | and actually proposing for the baseline that are not |
---|
0:06:45 | like covered in the literature which actually are sort of like |
---|
0:06:50 | useful it may be used models |
---|
0:06:53 | and then it at the end of you will actually talk about the model that |
---|
0:06:56 | we |
---|
0:06:56 | propose and heading that mind that might be helpful |
---|
0:07:02 | okay so |
---|
0:07:04 | there's like a bunch of |
---|
0:07:07 | even you knew what datasets in this domain |
---|
0:07:10 | where actually like you have the |
---|
0:07:11 | you know you have like conversations |
---|
0:07:14 | and back to the data that actually accompanied |
---|
0:07:18 | with external knowledge |
---|
0:07:19 | like one of them exactly like dcf dstc so one challenge problem |
---|
0:07:24 | no last year and it's like a sentence generation track |
---|
0:07:28 | basically the there are like i rated conversations |
---|
0:07:31 | and then you want to use the really reached btr goes |
---|
0:07:35 | to be able to generate better responses |
---|
0:07:40 | and there's a wizard of each p d a |
---|
0:07:43 | where like |
---|
0:07:44 | there's nature conversations between you know learner and the expert |
---|
0:07:48 | or because few d a |
---|
0:07:50 | it's also like a dataset |
---|
0:07:53 | the two d recent |
---|
0:07:55 | in this work we will |
---|
0:07:57 | actually talk about commit to dataset |
---|
0:08:01 | the one of the reasons y |
---|
0:08:03 | we |
---|
0:08:04 | so of worked on this was |
---|
0:08:06 | i'd this there is it doesn't sort of like need any double step |
---|
0:08:11 | so basically like you can just a drill the relevant facts to the dialogue already |
---|
0:08:15 | given |
---|
0:08:16 | and the dataset will talk about in more detail |
---|
0:08:20 | so basically in this dataset there is like a two persons which are basically |
---|
0:08:26 | you when a person must |
---|
0:08:28 | and data us to talk about you know sort of |
---|
0:08:32 | basically a whole like a conversation based on their per sentence |
---|
0:08:37 | and |
---|
0:08:38 | so |
---|
0:08:40 | some of the properties of this dataset is |
---|
0:08:43 | you know like basically |
---|
0:08:45 | some challenges are |
---|
0:08:47 | you know you have some packets that you want to be able to incorporate in |
---|
0:08:51 | your a response generation which is actually sort of one of the motivations of why |
---|
0:08:56 | you have like the personal |
---|
0:08:59 | but it's also like hard for the models to be able to do that |
---|
0:09:04 | and you have some like sort of had is needed facts |
---|
0:09:07 | where you don't sort of had to leave when you're persona |
---|
0:09:11 | but you have to be able to |
---|
0:09:14 | produce that which is also like a |
---|
0:09:16 | another main challenge of this dataset |
---|
0:09:20 | and there's all kinds of difference if you motors |
---|
0:09:23 | which are sort of i would say |
---|
0:09:26 | a close to the statistics of the data is that |
---|
0:09:29 | and hard to model |
---|
0:09:31 | okay so basically like this is the |
---|
0:09:33 | so this is the dataset that you're going to work on |
---|
0:09:37 | and some evaluation metrics before we dove into like |
---|
0:09:41 | devon two models |
---|
0:09:43 | so |
---|
0:09:44 | there will be like automated metrics which sort of our common |
---|
0:09:50 | for the sentence generation task |
---|
0:09:52 | which will be d so main task of this channel this challenge |
---|
0:09:57 | and will also have like a human evaluation where we ask |
---|
0:10:01 | if you must the rate the responses |
---|
0:10:03 | generated by the models from you know want to five |
---|
0:10:07 | will also like at the end present like a little bit further analysis on |
---|
0:10:12 | you know the ability of the models to |
---|
0:10:15 | incorporate the fax the kind of present the |
---|
0:10:21 | and finally will also have like this is also sort of like an automated metric |
---|
0:10:26 | divorce the analysis |
---|
0:10:29 | to see like if the most can do it is a divorce responses |
---|
0:10:34 | okay so basically like the models are going to be you know two parts one |
---|
0:10:39 | is the baseline models |
---|
0:10:40 | which will cover pretty fast and then we'll have the |
---|
0:10:44 | you know models that we sort of |
---|
0:10:47 | t there are helpful for this task |
---|
0:10:50 | so |
---|
0:10:52 | let's that with this because the sequence more we'd attention which is like basically |
---|
0:10:56 | you have the dialogue history |
---|
0:10:59 | which is sort of concatenated you know into a single sequence |
---|
0:11:03 | and then you have like the sequence encoder which we use like lstm |
---|
0:11:08 | and the we have like the decoder that actually sort of generates the |
---|
0:11:13 | response based on this |
---|
0:11:15 | and then |
---|
0:11:16 | we have like a sequence a sequence again with a single fact where we actually |
---|
0:11:20 | take you |
---|
0:11:21 | also they want us fact from the you know personal |
---|
0:11:25 | and then appended to the |
---|
0:11:29 | appended to the |
---|
0:11:29 | basically context now you have like a longer sequence |
---|
0:11:32 | have each also have sort of like of factual information |
---|
0:11:36 | and then you want to generate a response from this |
---|
0:11:38 | and then the most relevant artifacts a is updated in two ways |
---|
0:11:42 | the first one is like bass fact context |
---|
0:11:46 | which is basically you take the dialogue context and then |
---|
0:11:49 | a find a you know most element factor this |
---|
0:11:52 | based on tf-idf similarity |
---|
0:11:54 | and then we have basic response now the similarity |
---|
0:11:58 | is you know measured between |
---|
0:12:00 | the between the facts and the grounded response |
---|
0:12:04 | so this is like a |
---|
0:12:05 | you know |
---|
0:12:06 | g d model just to be able to see if you |
---|
0:12:09 | very able to provide like the |
---|
0:12:11 | you know right |
---|
0:12:12 | fact |
---|
0:12:13 | with the model be able to generate the |
---|
0:12:15 | you are generated better response basically |
---|
0:12:22 | so basically |
---|
0:12:24 | some results here i'm gonna first |
---|
0:12:29 | present the |
---|
0:12:30 | the results |
---|
0:12:31 | like be kind of the main results which are going to be like automated metrics |
---|
0:12:35 | like perplexity belligerence either |
---|
0:12:37 | i also like to human evaluation |
---|
0:12:39 | which is appropriate this |
---|
0:12:41 | so here like i and the not fact was the no |
---|
0:12:46 | the first model |
---|
0:12:47 | and as a basically what we see is that like you incorporating seen single fact |
---|
0:12:51 | improve the perplexity |
---|
0:12:52 | i is you is you see here |
---|
0:12:55 | and also like if you if you if you incorporate like sort of the cheating |
---|
0:12:58 | fact that it you like even further improves it |
---|
0:13:01 | but you sort of like moves from the |
---|
0:13:03 | and naturalness |
---|
0:13:05 | and like one of the sort of |
---|
0:13:08 | reason is |
---|
0:13:09 | i mean this is so like ipod this is the observed looking at the results |
---|
0:13:14 | is that |
---|
0:13:15 | so the not affect one kind of generates like a very sort of generating responses |
---|
0:13:20 | which are you like sometimes i see like a very frequently rated higher than the |
---|
0:13:26 | ones that are trying to incorporate the response |
---|
0:13:28 | but so the field of it |
---|
0:13:31 | so that sort of like the main reason for why don't happens |
---|
0:13:36 | and also another thing that is interesting here is that if you look at the |
---|
0:13:39 | appropriate the score of the ground truth response |
---|
0:13:42 | i mean this is out of five so it's four point four it's like a |
---|
0:13:46 | i'm not perfect |
---|
0:13:48 | that sort of another challenge here |
---|
0:13:51 | so i another and another line of sort of baselines that no like memory networks |
---|
0:13:57 | where basically like we and quote the context again we the no sequence model |
---|
0:14:03 | and i we take its representation ten on the facts |
---|
0:14:07 | each fact actually have like he representation which are in green |
---|
0:14:11 | which are basically like a vector |
---|
0:14:13 | and then they also have like a value representations there which are in blue |
---|
0:14:17 | so we a turn on the key representations and then |
---|
0:14:19 | i have a probability distribution over the facts |
---|
0:14:22 | and then compute a summary vector |
---|
0:14:26 | out of them and then be added to the context vector and then feed it |
---|
0:14:29 | to decoder and then because it generates the response |
---|
0:14:37 | so we will call this like a memo network like so for this task |
---|
0:14:42 | and then we hope will be also have like the |
---|
0:14:44 | you know version |
---|
0:14:46 | another version of this which is that is similar to |
---|
0:14:51 | you know a model that is covered in the previous works at this is again |
---|
0:14:54 | another baseline model |
---|
0:14:56 | where basically |
---|
0:14:58 | in the decoder you also have like an attention on the context |
---|
0:15:02 | itself |
---|
0:15:03 | so in the previous one there were like nor decoder per step decoder attention but |
---|
0:15:07 | here there is |
---|
0:15:10 | we also have like the fact that action version basically at every decoder step |
---|
0:15:14 | you have like an you know additional attention on the facts |
---|
0:15:18 | i mean basically when you generating used to go back and look at |
---|
0:15:21 | the fact |
---|
0:15:24 | and then we have like a name a network where like for both fact and |
---|
0:15:29 | context of action or enabled |
---|
0:15:33 | okay so if we |
---|
0:15:36 | look at the results of this compared to the like the and also baselines |
---|
0:15:41 | we see that like basically attentional only facts |
---|
0:15:44 | is you can see here results in the bad so of fact incorporation |
---|
0:15:49 | and |
---|
0:15:50 | and additionally like six acoustic models that actually v source try to |
---|
0:15:55 | alex year are compared to memory network models that actually sort of like proposed by |
---|
0:16:00 | a we hope you previous serves |
---|
0:16:06 | so |
---|
0:16:08 | basically on top of that like the next thing is that we realise that like |
---|
0:16:11 | the |
---|
0:16:12 | the sequence models that we sort of analyzed |
---|
0:16:15 | so failed to are reproduce the |
---|
0:16:18 | factual information like such as the ones that the that i showed at the beginning |
---|
0:16:22 | with us like idea what's ones one |
---|
0:16:25 | so for that we want to try you know we tried incorporating compute again |
---|
0:16:31 | you went on the baselines here |
---|
0:16:34 | for that we basically just to the point the generator you know network that is |
---|
0:16:38 | proposed two years back |
---|
0:16:42 | and what it does is basically at every sort of decoder step you basically have |
---|
0:16:47 | like you know |
---|
0:16:50 | soft combination of what generation |
---|
0:16:52 | and copying of the tokens from the input |
---|
0:16:55 | so that like if there is something in the input that is not in your |
---|
0:16:59 | vocabulary you can generated |
---|
0:17:03 | okay so basically as i said that see important for |
---|
0:17:08 | you know but using the art of |
---|
0:17:11 | likely but using deep actual information that may not end up in the vocabulary |
---|
0:17:15 | so basically what we do is like we had to use its you can small |
---|
0:17:20 | is that we like kind of sequences sequence models |
---|
0:17:23 | that be exploited to beginning we had to copy mechanism one double for each i |
---|
0:17:27 | don't look at what happens |
---|
0:17:28 | and we like sort of immediately sees |
---|
0:17:31 | that d copy mechanism improves all of them |
---|
0:17:34 | actually |
---|
0:17:36 | you know |
---|
0:17:36 | a pretty good and |
---|
0:17:38 | we also sort of see that like if you look at sort of like the |
---|
0:17:42 | one the model that you have c |
---|
0:17:43 | feed the base |
---|
0:17:44 | that's fact |
---|
0:17:45 | in a cheating where e |
---|
0:17:47 | basically that sort of like it says that like |
---|
0:17:50 | if you had a way to |
---|
0:17:53 | find the best there's fact that is not response then you'd be able to like |
---|
0:17:57 | to pretty good so it sort of like an upper bound again |
---|
0:18:00 | okay so now |
---|
0:18:02 | we just one a |
---|
0:18:04 | for their c |
---|
0:18:06 | how |
---|
0:18:07 | how we can actually make use of like |
---|
0:18:12 | every token in every fact that is available to us because previous models |
---|
0:18:18 | sort of either did it use |
---|
0:18:20 | all the facts like the sequence models we just pick one fact and then use |
---|
0:18:23 | that |
---|
0:18:23 | what the memo network models |
---|
0:18:25 | basically use the entire sort of like summary of the |
---|
0:18:29 | fact as a as a whole and then just use that |
---|
0:18:31 | now we wanna see |
---|
0:18:32 | what happens if basically we were able to condition the response and a few fact |
---|
0:18:37 | talk |
---|
0:18:39 | so i mean which this might be important in sort of like |
---|
0:18:43 | you know of copy d |
---|
0:18:46 | sort of copying the relevant |
---|
0:18:48 | pieces of information from the facts |
---|
0:18:50 | even though you're not actually given the |
---|
0:18:52 | this fact |
---|
0:18:54 | so |
---|
0:18:57 | so basically like the base for this is that |
---|
0:18:59 | we call it more thai stick to stick hierarchical attention |
---|
0:19:02 | where the context in court is the same |
---|
0:19:05 | but for the fact the encoding we also use like an lstm so basically we |
---|
0:19:09 | have the context of presentation for every fact |
---|
0:19:13 | sorry every fact token |
---|
0:19:14 | and what we do is that at every decoder step we take the you know |
---|
0:19:19 | the core state and at hand on the |
---|
0:19:21 | have sort of you know of |
---|
0:19:24 | tokens of the fact so which basically gives us like a distribution over the fact |
---|
0:19:27 | tokens |
---|
0:19:29 | and then |
---|
0:19:31 | sort of basically computing a context at the over these |
---|
0:19:35 | i'd users basically fact summaries |
---|
0:19:37 | and then we do for another attention on the fact summaries which gives us like |
---|
0:19:42 | a distribution over the faxing which fact might be more important |
---|
0:19:46 | and then |
---|
0:19:48 | we also have like you know context summary |
---|
0:19:51 | coming from the attention in the context |
---|
0:19:54 | and then we have one more attention |
---|
0:19:56 | which basically a times on the fact semantic context summary and then combines them |
---|
0:20:02 | based on like which one is more important this is all like sort of salt |
---|
0:20:05 | attention so you just like don't need any |
---|
0:20:08 | you know so this is basically a differentiable that's what i'm saying three |
---|
0:20:13 | and then now you sort of you to generate your response and |
---|
0:20:18 | the and the and the loss is basically the you know local it blows |
---|
0:20:23 | so the negative log-likelihood |
---|
0:20:27 | and i in the deep copy which is like this sort of main model that |
---|
0:20:30 | we propose in small in this paper what we try to exploit here is that |
---|
0:20:35 | basically everything remains of the same |
---|
0:20:37 | with the previous one that i showed |
---|
0:20:40 | but what we basically do here is that |
---|
0:20:43 | we use the probabilities |
---|
0:20:45 | like attention probabilities |
---|
0:20:46 | over the context |
---|
0:20:48 | tokens and the fact tokens |
---|
0:20:50 | as the corresponding you know copying probabilities |
---|
0:20:54 | so basically here is you can see you have like a distribution over the facts |
---|
0:20:58 | and distribution over the tokens of every fact so you can basically use a single |
---|
0:21:02 | distribution |
---|
0:21:03 | or whatever unique token in your facts here |
---|
0:21:07 | and then you also like have another question on |
---|
0:21:11 | you know context and the facts and using those |
---|
0:21:14 | you can combine these two into |
---|
0:21:16 | again single |
---|
0:21:18 | distribution |
---|
0:21:19 | and you can use that |
---|
0:21:21 | as the copy probabilities of tokens |
---|
0:21:25 | and then combine it with the generation |
---|
0:21:30 | so here |
---|
0:21:31 | basically we all already have like a generation probabilities over the vocabulary and then we |
---|
0:21:36 | also have like to copy world this from the context tokens and all these tokens |
---|
0:21:40 | and we combine all of them into single distribution |
---|
0:21:44 | and then if you look at the results |
---|
0:21:48 | basically on all of the you know evaluation metrics like the main evaluation metrics |
---|
0:21:53 | the copy sort of you know |
---|
0:21:56 | outperforms all the other models that we may see it here |
---|
0:22:01 | and then it's also important to note that this like a best for context plus |
---|
0:22:04 | copy that we sort of like try to analyze was also are computed model |
---|
0:22:10 | so |
---|
0:22:12 | okay a to b can probably |
---|
0:22:15 | it's good is |
---|
0:22:17 | so we also that like a divorce the analysis |
---|
0:22:19 | well you know this is a metric that is actually proposed in one of the |
---|
0:22:23 | previous works |
---|
0:22:25 | so look at looking at a do are still three did just wants to that |
---|
0:22:28 | generated so deep copy also like sort of is shown |
---|
0:22:33 | performing good here is all compared to the other models |
---|
0:22:38 | this is an example |
---|
0:22:40 | where we can see that like |
---|
0:22:43 | the deep copy can achieve that is that we wanted to do |
---|
0:22:47 | basically it can |
---|
0:22:49 | i depend on the right person of fact we just highlighted here before knowing which |
---|
0:22:52 | one is related it can copied exactly kind of relevant pieces |
---|
0:22:57 | from the fact and the current context of the dialogue |
---|
0:23:01 | and also like you can also see that it can copy and generate |
---|
0:23:04 | at the same time so basically it can switch between the modes |
---|
0:23:09 | so basically we propose a general model that actually can take a query which is |
---|
0:23:14 | the context in this case |
---|
0:23:15 | and then external knowledge which is basically set of facts in unstructured text |
---|
0:23:19 | and then you can generate a response out of them |
---|
0:23:23 | we propose like strong baselines |
---|
0:23:26 | on top of this |
---|
0:23:27 | and then show that the proposed model actually performs |
---|
0:23:31 | hospital you two d existing ones in the integer |
---|
0:23:37 | right |
---|
0:23:38 | that's it thank you for the scene |
---|
0:23:40 | i can take questions |
---|
0:23:44 | okay so we have any questions |
---|
0:23:46 | in the audience |
---|
0:23:49 | there |
---|
0:23:54 | i this is someone to form arkansas |
---|
0:23:57 | a quick costs when you say the for the coffee instead of focusing on only |
---|
0:24:03 | one side focus and a few five but so in fact is that like compute |
---|
0:24:08 | the ways of all the facts and then do a |
---|
0:24:12 | which sound |
---|
0:24:14 | instead of just p top three top |
---|
0:24:18 | so |
---|
0:24:20 | i mean like in are you asking what we do in the proposed model or |
---|
0:24:25 | in the proposed model in it was normally basically you feed all the facts |
---|
0:24:29 | and it can use which are which ones |
---|
0:24:31 | so you so but it doesn't pick exactly one so it actually they compute a |
---|
0:24:35 | soft representation out of all |
---|
0:24:38 | okay and then use that as a weighted sum of the vocabulary just |
---|
0:24:44 | well i that is actually copy part in the copied part you have like a |
---|
0:24:49 | vocabulary from which you can look at it of size it's a five k |
---|
0:24:54 | and then |
---|
0:24:55 | this is like so frequent words |
---|
0:24:57 | right |
---|
0:24:57 | and then you also have a way |
---|
0:24:59 | do you combine like you have a distribution with this right |
---|
0:25:02 | and then you have a distribution over the tokens unique tokens that appear either in |
---|
0:25:07 | fact |
---|
0:25:08 | whatever dialogue context |
---|
0:25:10 | so now you can induce the signal pro to distribution out of all of this |
---|
0:25:15 | and that we have like a single post a distribution which is computed in the |
---|
0:25:19 | software which means it's a differentiable so you can just shaded with the negative light |
---|
0:25:28 | okay |
---|
0:25:30 | i have a question also the of a human evaluation you have its appropriateness |
---|
0:25:36 | one was well |
---|
0:25:37 | where they actually |
---|
0:25:38 | because the motivation for this was to create more engaging |
---|
0:25:42 | responses and appropriate estimate doesn't sound like that so they have to be engaging so |
---|
0:25:48 | what is the actual instruction the |
---|
0:25:49 | that that's a good causes so |
---|
0:25:52 | actually that was |
---|
0:25:54 | something i had to |
---|
0:25:57 | as a bit so basically we have two |
---|
0:26:00 | to human of evaluations |
---|
0:26:02 | one is the appropriateness |
---|
0:26:04 | a one also is about like five inclusion analysis |
---|
0:26:08 | so this is i mean this is more relevant to |
---|
0:26:12 | measuring whether it is more engaging or not |
---|
0:26:15 | but it is |
---|
0:26:16 | it is not in time because of the following |
---|
0:26:19 | so if you look at the grassroots so here a so this is this a |
---|
0:26:22 | matrix that we have humans the rate the these are binary matrix |
---|
0:26:26 | so every inclusion mean is the response include the fact from the i mean doesn't |
---|
0:26:31 | have to be from a person |
---|
0:26:32 | it could be you have exceeded five or it could be a factor of course |
---|
0:26:37 | and then you have like a to follow this how much of it is coming |
---|
0:26:41 | from percent how much |
---|
0:26:42 | of it is coming from the station |
---|
0:26:44 | so that sort of like what we also the humans |
---|
0:26:48 | and |
---|
0:26:48 | basically here you tell the like this metric as a bit about |
---|
0:26:52 | of the engaging this but not exactly because of the following |
---|
0:26:56 | if you look at the ground truths course like this is the main metric here |
---|
0:27:00 | a you know |
---|
0:27:02 | so if i like factual information included from the persona |
---|
0:27:06 | if we look at the ground truth even that does |
---|
0:27:08 | fifty percent |
---|
0:27:09 | so it means that like the grounded responses |
---|
0:27:12 | even detente |
---|
0:27:14 | have |
---|
0:27:15 | sort of coverage of the |
---|
0:27:16 | for some affect all the time |
---|
0:27:18 | because you can think of this as |
---|
0:27:21 | in an actual conversation between the two person |
---|
0:27:24 | you like basically five fact cannot cover the complexity of |
---|
0:27:28 | you know such conversation right that's why |
---|
0:27:31 | this is also not a perfect metric |
---|
0:27:33 | so what i'm trying to say is |
---|
0:27:36 | measuring engaging this is a little bit |
---|
0:27:39 | more difficult |
---|
0:27:41 | we try to engage in this way so it measured this way |
---|
0:27:44 | just by looking at whether jen included below and have fact |
---|
0:27:50 | but we don't have like a perfect sort of evaluation for that |
---|