0:00:17 | okay so i |
---|
0:00:20 | how we can start hello everyone good morning and |
---|
0:00:25 | we'll come to the third session |
---|
0:00:29 | and today the topic is the end-to-end dialog systems and natural language generation |
---|
0:00:37 | we have none natural language generation model to end-to-end systems |
---|
0:00:43 | and the |
---|
0:00:45 | first speaker to the is post and saying |
---|
0:00:50 | i with the paper on a tree structured semantic encoder with knowledge sharing for domain |
---|
0:00:57 | adaptation in nlg |
---|
0:01:00 | so this is this is the natural language generation model |
---|
0:01:05 | a are we ready |
---|
0:01:07 | okay so |
---|
0:01:10 | go ahead you have the four |
---|
0:02:28 | hello everyone |
---|
0:02:29 | good morning work on to my presentation |
---|
0:02:32 | my name is both and then run university of cambridge and today i'm going to |
---|
0:02:36 | share my word tree structure semantic encoder with knowledge sharing for them annotation in nature |
---|
0:02:42 | language generation |
---|
0:02:46 | i guess |
---|
0:02:47 | pretty much of you |
---|
0:02:48 | are pretty much familiar with this pipeline dolls system |
---|
0:02:52 | here just one a high like that |
---|
0:02:54 | this work is focusing on |
---|
0:02:56 | this in a chilling generation components |
---|
0:02:59 | so the input is just these semantics from the policy network and the output is |
---|
0:03:03 | natural language |
---|
0:03:06 | okay so given the semantics representation like this |
---|
0:03:12 | he really too many source from the man |
---|
0:03:15 | and the system is informed about the in the end of the rest run |
---|
0:03:19 | address |
---|
0:03:20 | and it's phone number |
---|
0:03:23 | so we soar nature language model |
---|
0:03:26 | it would produce the nature of language to a user |
---|
0:03:29 | and this sentences this all turns has to contain all the correct information in the |
---|
0:03:34 | semantics |
---|
0:03:35 | that's the goal of an image a model |
---|
0:03:38 | we focus on domain that the patients in there really in this work |
---|
0:03:42 | which means that |
---|
0:03:44 | you might have bunch of data from your source in |
---|
0:03:47 | and you can use that data |
---|
0:03:49 | to put on your model |
---|
0:03:51 | to get a preacher model |
---|
0:03:53 | and then you want to use some of the limited data from your target the |
---|
0:03:58 | men |
---|
0:03:58 | to finding your model |
---|
0:04:00 | that makes you model maybe able to work well in the in the domain you |
---|
0:04:05 | are interested in |
---|
0:04:07 | that's of the meditation scenario |
---|
0:04:11 | so how do we on usually encode all semantics |
---|
0:04:15 | among prior work |
---|
0:04:17 | pretty much to mend approach |
---|
0:04:20 | the first one the this |
---|
0:04:22 | people will use pine the representation |
---|
0:04:24 | like this |
---|
0:04:25 | so each element |
---|
0:04:27 | each element in the back to representation |
---|
0:04:29 | its corresponding to the certain slot value pairs |
---|
0:04:32 | and your ontology |
---|
0:04:36 | or we can treat |
---|
0:04:38 | or semantics |
---|
0:04:39 | as a sequence of tokens |
---|
0:04:40 | and singleuser lstm |
---|
0:04:43 | to encode your semantics |
---|
0:04:47 | actually |
---|
0:04:48 | both of approach works well |
---|
0:04:50 | however |
---|
0:04:52 | they don't really capture |
---|
0:04:53 | the internal structure of the something takes |
---|
0:04:56 | for example |
---|
0:04:56 | in the semantics |
---|
0:04:58 | you actually have this kind of tree structure |
---|
0:05:01 | because |
---|
0:05:03 | under the request |
---|
0:05:05 | there's a full price slot |
---|
0:05:07 | then a more data system used to ask from the user |
---|
0:05:10 | so like here like this up to here |
---|
0:05:13 | and on there'll inform dialogue act |
---|
0:05:16 | you actually have three slot |
---|
0:05:18 | information that you want to tell the user |
---|
0:05:21 | and both style that's |
---|
0:05:22 | are on their the restaurant domain |
---|
0:05:27 | so |
---|
0:05:28 | that's the semantic structure is not capture by lows by lows to approach |
---|
0:05:35 | but doing you really need to capture these kind of structure |
---|
0:05:38 | the c help if it's not help then what about the right |
---|
0:05:42 | i'll give it a very simple example |
---|
0:05:46 | so again given this then summing takes like this |
---|
0:05:49 | for the source in |
---|
0:05:52 | and you have the corresponding tree like this |
---|
0:05:56 | during adaptation |
---|
0:05:57 | in the domain adaptation scenario |
---|
0:06:00 | you mike you might have these similar on semantics |
---|
0:06:04 | we sure some contents |
---|
0:06:09 | and that's its corresponding tree structure |
---|
0:06:12 | as you can see here |
---|
0:06:14 | most of the information |
---|
0:06:17 | i shared between those two semantics in the tree structure |
---|
0:06:21 | besides |
---|
0:06:22 | domain information |
---|
0:06:24 | so if we can come up with about a weight to capture low structures |
---|
0:06:29 | within a someone thinks |
---|
0:06:30 | perhaps the model is able to surely information |
---|
0:06:33 | more effectively |
---|
0:06:35 | between domains doing them annotation |
---|
0:06:37 | and that's the motivation of this work |
---|
0:06:40 | so the question here is |
---|
0:06:42 | how to encode the structure |
---|
0:06:48 | so here is the on the pos model forty |
---|
0:06:51 | tree structure semantic encoder |
---|
0:06:55 | actually the structure is pretty much |
---|
0:06:57 | the one you see |
---|
0:06:58 | in the previous slide |
---|
0:07:00 | first |
---|
0:07:01 | we have the slot layer |
---|
0:07:03 | and all your slots in the ontology |
---|
0:07:06 | will be listed here |
---|
0:07:10 | and then you have dialogue act layer |
---|
0:07:12 | it is used to describe all it a lattice you have |
---|
0:07:15 | in your system |
---|
0:07:18 | and then we have done the layer |
---|
0:07:22 | i bought and of the tree |
---|
0:07:23 | we designed a property layer |
---|
0:07:25 | that is used to describe |
---|
0:07:27 | the property of a slot |
---|
0:07:29 | because for example |
---|
0:07:31 | any slot |
---|
0:07:32 | perhaps area can be requestable |
---|
0:07:35 | or sort can be requestable |
---|
0:07:37 | and the |
---|
0:07:39 | here is informal |
---|
0:07:42 | so we use it to describe the property of whistle |
---|
0:07:46 | so |
---|
0:07:47 | and given the semantics like this |
---|
0:07:50 | based on all information all the structure you has |
---|
0:07:53 | we can build a corresponding tree |
---|
0:07:55 | we with this definition of a tree |
---|
0:07:58 | so first but you sound basically based on the property of a slot you can |
---|
0:08:02 | peel the links between the property layer |
---|
0:08:05 | between the property layer and this follow your |
---|
0:08:08 | and then |
---|
0:08:09 | all the slots will goes to load a lax |
---|
0:08:12 | it belongs to in the semantics |
---|
0:08:14 | like this |
---|
0:08:17 | and two of the da lacks in this example will go to respond to men |
---|
0:08:23 | i finally |
---|
0:08:24 | we'll take the root of the tree |
---|
0:08:25 | as they find the representation |
---|
0:08:28 | so that this is the way we can |
---|
0:08:30 | encode |
---|
0:08:31 | the tree structure in the semantics |
---|
0:08:35 | how what we really compute what do we exactly compute in the three |
---|
0:08:40 | and basically we focus on we follow the work the problem worked three lstm |
---|
0:08:47 | in the two thousand fifteen |
---|
0:08:50 | first |
---|
0:08:52 | for example on that say the node here |
---|
0:08:56 | we compute |
---|
0:08:57 | the summation over all is chosen |
---|
0:09:02 | the hidden state the summation of the hidden state in the summation of the |
---|
0:09:06 | memory cell but always trojan |
---|
0:09:11 | and then |
---|
0:09:12 | like the when you live lstm |
---|
0:09:14 | we compute the input gate forget gate and a bouquet |
---|
0:09:19 | and finally |
---|
0:09:20 | we can compute the memory cell and hidden state |
---|
0:09:23 | at is clear enough to you |
---|
0:09:29 | so |
---|
0:09:30 | on |
---|
0:09:33 | again give a again the same simple example |
---|
0:09:37 | given the semantic thing the source in |
---|
0:09:44 | we have the corresponding trick structure |
---|
0:09:47 | and doing of the patient |
---|
0:09:50 | you might have this then you might have the steamer some intakes in the target |
---|
0:09:53 | domain |
---|
0:09:54 | and thus we can see here |
---|
0:09:55 | without design |
---|
0:09:57 | two structured |
---|
0:09:58 | most information the tree |
---|
0:10:00 | are shared |
---|
0:10:02 | and we hope that can help model fisher information between domains |
---|
0:10:11 | okay |
---|
0:10:12 | so now so far we know how to encode a tree |
---|
0:10:14 | of the semantics |
---|
0:10:16 | then that's go to the generation process |
---|
0:10:21 | it is very straightforward to just take the output it |
---|
0:10:24 | the final representation of a tree as teens initialization |
---|
0:10:28 | of your decoder |
---|
0:10:30 | and we follow some prior work |
---|
0:10:32 | where the value in the all turns are delexicalise as the |
---|
0:10:38 | so our token we do something |
---|
0:10:40 | so in this work we designed a slot spoken as domain information dialect information and |
---|
0:10:45 | slot information |
---|
0:10:48 | so we just follow the center cross entropy |
---|
0:10:51 | to train our decoder |
---|
0:10:54 | sounds alright sounds good |
---|
0:10:55 | we have a way to encode a trick structure |
---|
0:10:59 | but actually for think more we just use the battery abstract information of a tree |
---|
0:11:05 | however they are |
---|
0:11:07 | punching him a bunch of information at intermediate level |
---|
0:11:12 | thanks to our on define tree |
---|
0:11:15 | so this moldable us to |
---|
0:11:18 | come up with a better way |
---|
0:11:19 | to access to information at intermediate level |
---|
0:11:22 | so that it decoder |
---|
0:11:24 | can have more information about three structure |
---|
0:11:29 | so here we propose are on it is very sorry for |
---|
0:11:32 | we apply |
---|
0:11:34 | we applied attention to the |
---|
0:11:36 | to the to the top man tell at and slow later |
---|
0:11:40 | do you have otherwise attention we can't |
---|
0:11:44 | whenever the model |
---|
0:11:45 | the decoder |
---|
0:11:47 | produce the special flock |
---|
0:11:49 | slot token like this |
---|
0:11:53 | the hidden state at each time-step |
---|
0:11:55 | will be used as acquire we |
---|
0:11:57 | to trigger the tension mechanics in |
---|
0:11:59 | like this so for example at the slot later |
---|
0:12:04 | all the slot information |
---|
0:12:06 | will be treated as the context |
---|
0:12:09 | for the |
---|
0:12:10 | for the attention what kind of you |
---|
0:12:13 | and then the model |
---|
0:12:14 | we compute |
---|
0:12:16 | a proper the probability distribution over or information for the three layers |
---|
0:12:21 | so for example again |
---|
0:12:23 | in slot s law they are |
---|
0:12:25 | you will have a distribution over all possible slot |
---|
0:12:29 | it basically tales model which slot |
---|
0:12:32 | each to focus on which information the models you focus on that is done step |
---|
0:12:39 | of course during training |
---|
0:12:40 | we do have supervision signals |
---|
0:12:42 | from the input semantics |
---|
0:12:45 | this can help the model this can guy to model |
---|
0:12:48 | to tell him what to focus on |
---|
0:12:50 | at each time-step |
---|
0:12:54 | and then will use this extra |
---|
0:12:55 | we use this attention distributions as the ask for information for the next time step |
---|
0:13:02 | and the and the generation process |
---|
0:13:04 | goes on |
---|
0:13:06 | so |
---|
0:13:07 | with all they'll wise attainable kind is an |
---|
0:13:10 | on a loss function becomes standard cross entropy |
---|
0:13:14 | then the cross entropy plots |
---|
0:13:16 | or a loss for only loss |
---|
0:13:18 | from the |
---|
0:13:18 | three attention mechanisms |
---|
0:13:21 | that's how we use a channel or model |
---|
0:13:26 | okay that's goes to some basic setups |
---|
0:13:29 | for experiments |
---|
0:13:31 | we are using models i was dataset which is which has on ten thousand dialogues |
---|
0:13:37 | over seven domains |
---|
0:13:39 | and within all utterance |
---|
0:13:42 | it's actually have more than one dialogue act |
---|
0:13:47 | we have three strong baselines the first one is as the lstm |
---|
0:13:52 | on it basically use a binary representation to encode the semantics |
---|
0:13:57 | and we have |
---|
0:13:58 | t gen and ra lstm |
---|
0:14:01 | lows to model i'll basically sector set model |
---|
0:14:04 | so they are using lstm encode i think older |
---|
0:14:07 | i think all the semantics |
---|
0:14:10 | a small evaluation |
---|
0:14:12 | we have on the stander |
---|
0:14:14 | on all the mathematics such as blue |
---|
0:14:18 | and also to fly error rate |
---|
0:14:20 | because we don't we don't want all we don't one or channel |
---|
0:14:23 | nature link generation model |
---|
0:14:24 | just before when but also |
---|
0:14:26 | the content should be correct |
---|
0:14:29 | and we also conduct a human evaluation |
---|
0:14:34 | okay let's see some numbers first |
---|
0:14:36 | on |
---|
0:14:38 | here this database |
---|
0:14:39 | source the man is first run |
---|
0:14:41 | any target domain is hotel |
---|
0:14:46 | the have access |
---|
0:14:48 | is the different amount of the adaptation data |
---|
0:14:51 | any white athens is the bleu score |
---|
0:14:55 | three baseline models are here |
---|
0:14:58 | and tree structure tree structure encoder |
---|
0:15:01 | and its variant |
---|
0:15:02 | tree structure with attention we kind of them |
---|
0:15:05 | as you can see on |
---|
0:15:08 | with for the patient data a hundred percent data |
---|
0:15:11 | that all the all the model performed pretty much similar |
---|
0:15:14 | because the data is |
---|
0:15:15 | pretty much enough |
---|
0:15:17 | however |
---|
0:15:18 | on with last data |
---|
0:15:21 | such as the last m five percent |
---|
0:15:24 | our model start again benefits |
---|
0:15:27 | thanks to the on |
---|
0:15:30 | structure |
---|
0:15:30 | sense to the tree structure |
---|
0:15:34 | that's the last again number of these slot error rate |
---|
0:15:38 | that's not error rate is defined like this |
---|
0:15:41 | we don't want our model |
---|
0:15:42 | to produce |
---|
0:15:43 | missing slots |
---|
0:15:45 | to have missing slots or put to use redundant slot |
---|
0:15:50 | so again with a hundred percent of data |
---|
0:15:53 | all the model performs very similar |
---|
0:15:54 | they're all good |
---|
0:15:56 | with all data |
---|
0:15:57 | however |
---|
0:15:58 | which pretty much last data |
---|
0:16:00 | with pretty limited data |
---|
0:16:04 | even in the |
---|
0:16:06 | one point twenty five percent of the data |
---|
0:16:08 | our model start to |
---|
0:16:10 | on produce very good performance |
---|
0:16:12 | overall the baselines |
---|
0:16:18 | previous like just show one setups |
---|
0:16:20 | we actually conduct three c given kind of set up to show that |
---|
0:16:24 | the model works in different scenarios |
---|
0:16:28 | the first column is |
---|
0:16:30 | the one used all in the previous line |
---|
0:16:32 | restaurant tube don't hotel adaptation |
---|
0:16:36 | and the second one |
---|
0:16:37 | the middle column is the restaurant at attraction |
---|
0:16:40 | and the second and the sort of one is trying to taxi |
---|
0:16:44 | here we just want to show that we can observe a similar trend similar results |
---|
0:16:49 | overall different setups |
---|
0:16:54 | okay so we all know that natural language generation task |
---|
0:16:58 | is not enough |
---|
0:16:59 | to just evaluate by the automatic metrics |
---|
0:17:02 | so we also conduct you may validation |
---|
0:17:04 | we use but amazon mechanical turk |
---|
0:17:08 | each mturk that asked to score five out of it in terms of |
---|
0:17:12 | informativeness |
---|
0:17:14 | and they show in this |
---|
0:17:16 | so here some basic numbers |
---|
0:17:20 | in terms of informativeness |
---|
0:17:22 | the tree structure with attention |
---|
0:17:25 | score the best |
---|
0:17:27 | and the tree without attention score the second |
---|
0:17:30 | which tells us that |
---|
0:17:32 | if you have a better way to encode your trick structure |
---|
0:17:36 | then the information can be sure for determine that the patient |
---|
0:17:40 | the model is tend not the model tends to produce |
---|
0:17:43 | right correct semantics in your in the generated sentences |
---|
0:17:49 | meanwhile we can still meant and the nature of nature and s |
---|
0:17:52 | of their generative sentences |
---|
0:17:56 | so we wonder |
---|
0:17:58 | where r |
---|
0:17:59 | improvements coming from |
---|
0:18:01 | what kind of as get what kind of example are more or model really performs |
---|
0:18:05 | good |
---|
0:18:05 | performs well |
---|
0:18:07 | so we divide the task that into seeing and on things up that |
---|
0:18:11 | subset |
---|
0:18:13 | thing basic leanings |
---|
0:18:15 | if the input semantics is thing during training |
---|
0:18:18 | then it's belongs to sing subset otherwise is on thing |
---|
0:18:24 | that's |
---|
0:18:25 | let's see the |
---|
0:18:26 | numbers from the fifty percent adaptation data |
---|
0:18:30 | with this bunch of data |
---|
0:18:32 | most of the testing example are thing |
---|
0:18:35 | and all the model performs similarly |
---|
0:18:37 | well as numbers are the |
---|
0:18:39 | number of the wrong examples the model produces |
---|
0:18:43 | and the lower the better |
---|
0:18:45 | however |
---|
0:18:47 | with very limited adaptation data |
---|
0:18:50 | out of nine hundred on things semantics |
---|
0:18:53 | that the semantics never think before doing training or that of the patient |
---|
0:18:58 | those based the baseline system |
---|
0:19:00 | but does |
---|
0:19:01 | several around seven hundred |
---|
0:19:03 | raw examples |
---|
0:19:04 | wrong semantics in the generative sentences |
---|
0:19:08 | however archery with attention can produce very low number |
---|
0:19:13 | just around a hundred and thirty |
---|
0:19:16 | so this a this implicitly tell us |
---|
0:19:19 | our model my have the better tinnitus in a ability |
---|
0:19:23 | to the on things semantics |
---|
0:19:28 | okay so here's comes to my conclusion |
---|
0:19:33 | by modeling the semantic structure |
---|
0:19:35 | low information might be shared between domains and this is helpful for domain adaptation |
---|
0:19:41 | and our model we use |
---|
0:19:43 | especially with the with the proposed there was attention mechanics and |
---|
0:19:48 | generates better sentences in terms of automatic metrics and the human scores |
---|
0:19:54 | especially with the limit |
---|
0:19:56 | very limited adaptation data |
---|
0:19:58 | our model performs the best |
---|
0:20:02 | so thank you very much for your calming |
---|
0:20:04 | and the any question and feedbacks are welcome thank you |
---|
0:20:12 | thank you very much so questions |
---|
0:20:21 | you said that you're doing well with one point two five percent which sounds good |
---|
0:20:24 | what's the number of training examples yes one point |
---|
0:20:28 | yes |
---|
0:20:29 | on |
---|
0:20:31 | is here so for example when we adapt one restaurant or hotel during preach an |
---|
0:20:36 | example is |
---|
0:20:37 | eight point five k but if we are using only one percent |
---|
0:20:41 | here it probably six under |
---|
0:20:44 | is still song |
---|
0:20:47 | yes |
---|
0:20:49 | hi can't go to the plot for the tree |
---|
0:20:53 | yes so you're for the full use |
---|
0:20:57 | yes of to unseen so first use a the attention is on all fours on |
---|
0:21:03 | wall for slot is not all the slots |
---|
0:21:05 | but for a given example the only the for green nodes are like yes in |
---|
0:21:13 | the data so why do we do need to attend to internet and from which |
---|
0:21:17 | sorry actually is the slot within the all semantics |
---|
0:21:22 | so only the slots in the semantics are activated |
---|
0:21:26 | and will be what we use it for the outage in case of another question |
---|
0:21:30 | is when you do domaintransfer what if the two domains have different sets of slots |
---|
0:21:36 | and for those slots that only appear in one in the onscene domain it's never |
---|
0:21:42 | trained in the in the data because in |
---|
0:21:45 | on |
---|
0:21:46 | because by the nature of this dataset |
---|
0:21:48 | as you see we have restaurant hotel attraction we sure which we sure low three-dimensional |
---|
0:21:55 | most of these slot or they have their unique slot relation most of false and |
---|
0:22:00 | each line and taxi sure some slot so that's why when and when i have |
---|
0:22:04 | that's lying or setup we have there's run to a hotel restaurant to attraction |
---|
0:22:09 | and is trying to taxi |
---|
0:22:10 | because we try to leverage the sure slots |
---|
0:22:18 | hello great so i had a question about the evaluation that looks at the you |
---|
0:22:24 | redundant and missing slots |
---|
0:22:27 | yes that site error rate |
---|
0:22:30 | my question is |
---|
0:22:32 | conceptually why does not even need to be a problem because |
---|
0:22:36 | you could have constraints |
---|
0:22:38 | that ensure that each slot is produced exactly one time during the gender on |
---|
0:22:44 | yes and it what depends on how you put your constraints on |
---|
0:22:47 | if you put in on generation loss function loss function during training |
---|
0:22:51 | that doesn't guarantee right down again to model still fall your constant |
---|
0:22:56 | but if you put your constrained at the output like more after like a post |
---|
0:23:00 | processing |
---|
0:23:02 | you my few there are some slot that's good but you might have not |
---|
0:23:06 | you might come up with a on the each row sentences right because you use |
---|
0:23:10 | more too few it out something |
---|
0:23:12 | you need to come up with small was to make it for one between |
---|
0:23:14 | between the floor you figured out |
---|
0:23:17 | so it is actually a problem and the we simply follow some prior work which |
---|
0:23:23 | is which is my fix the use ice |
---|
0:23:26 | just okay so i guess yes conceptually i get there'd be a tradeoff between naturalness |
---|
0:23:30 | and coverage but if you know in advance that a requirement is coverage than |
---|
0:23:36 | i guess you're only degree of freedom would be to give a constant natural |
---|
0:23:43 | sorry i |
---|
0:23:44 | so i had a miss your left and i just making the comment that if |
---|
0:23:47 | you know in advance that you're requirement is that you need to generate all the |
---|
0:23:50 | slots yes in your only degree of freedom |
---|
0:23:53 | is to give up on naturalness |
---|
0:23:55 | all right based on that nation if the scoring for the right in this task |
---|
0:24:01 | thanks to |
---|
0:24:02 | i have a question regarding this year that you show you have shown here i |
---|
0:24:09 | picture yes and is thereby eigenvalues and somehow encoding in this year so that you |
---|
0:24:15 | are only taking into account is not |
---|
0:24:18 | only the slot we don't use |
---|
0:24:20 | value because you don't need to nine |
---|
0:24:23 | yes and also the value is there's too much actually i don't use the male |
---|
0:24:27 | and then i have anything completion and have you thinking in a moment we then |
---|
0:24:34 | it takes a condensation |
---|
0:24:37 | without elicitation |
---|
0:24:39 | on yes |
---|
0:24:41 | on |
---|
0:24:42 | any will come up with some questions for example your value will be pretty much |
---|
0:24:47 | like open but vocabulary right from if you have for this dataset we have restaurant |
---|
0:24:53 | then |
---|
0:24:54 | attraction m and the hotel in n |
---|
0:24:57 | and the train it |
---|
0:24:59 | and the time slot |
---|
0:25:01 | this will become very complex |
---|
0:25:04 | it is a still challenging problem in analogy |
---|
0:25:08 | okay |
---|
0:25:11 | right i think we need to move to the next papers so let's think the |
---|
0:25:15 | speaker again thank you very much |
---|