0:00:15 | well in a single parent i'm not actually when we also this paper so this |
---|
0:00:19 | paper by properly aren't each element area of this |
---|
0:00:23 | should be with |
---|
0:00:25 | all right i do not work we don although the force also tracker are the |
---|
0:00:33 | weather |
---|
0:00:36 | this we will use that also just have some background the motivation |
---|
0:00:40 | and i don't talk american what we mean by his of the inference |
---|
0:00:45 | and but not all introduce the problem statement and have a model and tell her |
---|
0:00:50 | we |
---|
0:00:52 | and deal with cross domain generalization |
---|
0:00:55 | and then introduce a data experiments and an |
---|
0:00:59 | of and conclusions and future work |
---|
0:01:01 | so in terms of the background in |
---|
0:01:04 | so as the main use of these days of c critical to |
---|
0:01:07 | successful completions of tasks and task the stylus |
---|
0:01:11 | and usability is |
---|
0:01:14 | braries expresses the probability distribution over goals which are represented as state value pairs |
---|
0:01:20 | and i'm typically state tracking or tracking approaches use dialogue acts to infer user intentions |
---|
0:01:27 | towards the slot values that have been detected and typical dialogue acts would be inform |
---|
0:01:33 | deny in four |
---|
0:01:35 | requests negates so an example utterance here finally french restaurants |
---|
0:01:40 | in boston a |
---|
0:01:42 | example slu output would be informed crazy is french |
---|
0:01:47 | okay she got city equals lost or not |
---|
0:01:50 | and |
---|
0:01:51 | so |
---|
0:01:52 | basically in |
---|
0:01:56 | the idea of this work it is our motivation of this work is guy recordings |
---|
0:02:00 | dialogue acts are always not adequately capture the user intends to words |
---|
0:02:05 | so far is in the bookcases |
---|
0:02:08 | so one example is implicit denial so in this example here you know if the |
---|
0:02:14 | user invites by john |
---|
0:02:16 | and joe for dinner |
---|
0:02:18 | and then se stage o often is so i think here that this is in |
---|
0:02:22 | place of because it doesn't cars onto it in ir in the case |
---|
0:02:26 | dialogue act |
---|
0:02:28 | and expect so here we have the |
---|
0:02:31 | we have the |
---|
0:02:33 | or comments on the left and you expected slu a for a typical system on |
---|
0:02:37 | the right |
---|
0:02:38 | and to another limitation can be expressed and preferences for slot values and this is |
---|
0:02:44 | specifically to i think to the space slot value so in this example we asked |
---|
0:02:50 | to read french restaurants in los and not that |
---|
0:02:53 | the second order and says |
---|
0:02:55 | finally some in san jose to disordered finally by singular right instead so current slu |
---|
0:03:01 | and expresses dialogue acts wouldn't |
---|
0:03:04 | distinguish between the second and sororities whereas the intent is of see quite different in |
---|
0:03:09 | that in that sort instances basically express the preference for your right which would imply |
---|
0:03:14 | to replace |
---|
0:03:16 | da replace go well as currently in the state we can write i'm |
---|
0:03:21 | in the second instance you just wanted and it's |
---|
0:03:25 | and then i don't know in the limitation is that |
---|
0:03:29 | it doesn't deal well we numerical update if you just one incrementality command so in |
---|
0:03:34 | this example you know you ask for table four formant then you might say a |
---|
0:03:39 | four more seats to lose more seats and the expected a popcorn systems would deal |
---|
0:03:45 | with tiger |
---|
0:03:46 | and |
---|
0:03:48 | so the solution is in the solution that we propose that the authors proposed |
---|
0:03:52 | is it okay intense which basically describes a new semantic class of intense directly |
---|
0:03:59 | hi to the update you get a user intends |
---|
0:04:02 | and so here's the list of in intense so |
---|
0:04:06 | the first one is apparent so basically the user specifies that |
---|
0:04:10 | and value or multiple value |
---|
0:04:13 | for multi valued slots so it's basically equivalent of attending to specify values to motivate |
---|
0:04:19 | yourself for remove |
---|
0:04:21 | and basically it's the |
---|
0:04:24 | the complement of that basically to remove the value and from what about a slot |
---|
0:04:29 | replace |
---|
0:04:31 | expresses the press a reference for the slot so basically |
---|
0:04:35 | it |
---|
0:04:36 | expresses and evaluate that to be preferred over previous value six enter means replace existing |
---|
0:04:41 | value and then have increased by and decrease by which are specific to numeric or |
---|
0:04:47 | types |
---|
0:04:49 | i am i'm here some examples so |
---|
0:04:52 | so what we have here is an utterance and hundred |
---|
0:04:56 | very conventional slu and then satirical not intense |
---|
0:05:01 | and so for example we had earlier of tape show off the latest this would |
---|
0:05:05 | maybe common informative when a sequence joe |
---|
0:05:09 | and for restaurant search then we see their data and for examples of find someone |
---|
0:05:14 | somehow say to this would be informed up and whereas find me in gilroy instead |
---|
0:05:20 | become an informer place |
---|
0:05:22 | and |
---|
0:05:24 | and then for data |
---|
0:05:26 | numerical examples |
---|
0:05:28 | and |
---|
0:05:29 | for a for more seats the be common form increased by four |
---|
0:05:34 | it can you move to see what and for increased by |
---|
0:05:39 | two |
---|
0:05:41 | and okay so environment and how we formulate this problem basically |
---|
0:05:46 | you have any user utterance a identify the intense all the slot values mentioned in |
---|
0:05:51 | it so the impose user utterance that's alright intact with slots and values |
---|
0:05:56 | and here what is update intense for |
---|
0:05:59 | and these five classes for all slots are so here we see two examples drop |
---|
0:06:04 | one person wearing |
---|
0:06:07 | number i guess is the slot name and one is the stuff hourly and |
---|
0:06:12 | updated decreased by and you're example joe can make it so and people names is |
---|
0:06:17 | the slot name and joseph hourly and the update intent here is to remove the |
---|
0:06:22 | and we formulate this is a multiclass classification of slot |
---|
0:06:26 | a second to five okay ten classes |
---|
0:06:29 | so modeling here is |
---|
0:06:33 | sequence labeling problem with a bidirectional lstm |
---|
0:06:37 | and |
---|
0:06:39 | so the user utterances a sequence of slot values of tokens |
---|
0:06:43 | and i'm labels |
---|
0:06:45 | basically the user intents or words that analysis slot don't responses slot values the labels |
---|
0:06:52 | just generic token and we also do so the effects of quantization of slot filling |
---|
0:06:56 | so this is what it looks like |
---|
0:06:58 | and so on the bottom we have the input okay forget sunnyvale try to put |
---|
0:07:02 | you know instead the for sick reduced delexicalise that so we have a slot value |
---|
0:07:06 | we basically delexicalise it to the slot name which has been shown |
---|
0:07:11 | previous work to generalize better |
---|
0:07:14 | limited training data and the slot values themselves you know maybe a vocabulary in the |
---|
0:07:20 | training data |
---|
0:07:22 | and then we have a the embedding layer and then basically |
---|
0:07:26 | a typical bidirectional lstm i don't finally we have a softmax layer and we predict |
---|
0:07:32 | the target so you can see in this example basically okay forget are tokens |
---|
0:07:38 | so you very little or location you'll content words that the and for computing you |
---|
0:07:44 | know you know the intense words that replace |
---|
0:07:50 | and |
---|
0:07:50 | so |
---|
0:07:52 | the actual realization was silent is helpful when in generalizing to slot values not seen |
---|
0:07:57 | in the training data well it's |
---|
0:08:02 | but only really with it within a single domain so in cross want to go |
---|
0:08:07 | cross-domain the slot names maybe difference in you may see slot names in |
---|
0:08:12 | the target domain that didn't exist in the source domain |
---|
0:08:16 | and however different domain should if we can group slot names in two types different |
---|
0:08:22 | domains should share the same types of slots |
---|
0:08:25 | so and as an example restaurant reservation and online shopping domains have numbering gas certain |
---|
0:08:33 | number grocery items is about numeric five so we can we can relax if we |
---|
0:08:37 | can be like spoken about the lexicalised to this |
---|
0:08:41 | in high and we may be able to generalize |
---|
0:08:44 | so the solution is that the lexical items like five |
---|
0:08:47 | so these are the three slot-types to within the final maybe memorex |
---|
0:08:53 | so slots which become really increase and decrease |
---|
0:08:57 | and we've two |
---|
0:08:58 | types of multi valued slots this junk those slots which can |
---|
0:09:03 | take multiple values in this junction solidworks or so and there is an example of |
---|
0:09:08 | was lost |
---|
0:09:09 | or not |
---|
0:09:10 | i'm counterfeit which communist and can take multiple values |
---|
0:09:14 | in conjunction syllables than in the names of people going to the items in shopping |
---|
0:09:19 | list |
---|
0:09:21 | okay so much |
---|
0:09:22 | two |
---|
0:09:23 | to evaluate this and what we were acquired from a dataset was dialogs containing |
---|
0:09:28 | you may rate can control them in this show the multi valued slots |
---|
0:09:32 | in the domain ontology allow the weights you know annotations for the proposed update express |
---|
0:09:37 | and |
---|
0:09:39 | so i'm basically an existing data sets and didn't have all of these |
---|
0:09:44 | so the also screen the wrong data set |
---|
0:09:48 | and it does basically they talk to domains or restaurants in online shopping |
---|
0:09:53 | and had eight different professional editors generate conversations |
---|
0:09:58 | and in these domains and so that it the basically asked craig our conversations corresponding |
---|
0:10:05 | to the task the task would be search for a restaurant |
---|
0:10:08 | make it in or booking by |
---|
0:10:10 | groceries by close |
---|
0:10:13 | and they were told to assume appropriate part responses cover not require building an end-to-end |
---|
0:10:18 | system here |
---|
0:10:19 | and basically don't have a button to the czech generated a were annotated with slot |
---|
0:10:24 | names and the update intense |
---|
0:10:27 | i just as a reminder this is what you eight essentially look like you have |
---|
0:10:31 | the utterance |
---|
0:10:32 | it's annotated with the |
---|
0:10:35 | with the slot name the slot value which would be impulse the system |
---|
0:10:39 | and the update intent will be detected but |
---|
0:10:44 | and this is the |
---|
0:10:47 | this for the restaurant and shopping domain this is the list the slot and names |
---|
0:10:51 | under types so we have been a participant names number of gas menu items cuisine |
---|
0:10:58 | and location restaurants |
---|
0:11:00 | and grossly items quantity of roast or operate items colour and size for shopping |
---|
0:11:07 | i mean you can see that although |
---|
0:11:10 | the this out of it it's not names are disjoint they stiff share the same |
---|
0:11:15 | and slot right |
---|
0:11:17 | okay so after the data was greater than at a this is what the distribution |
---|
0:11:21 | looks like so |
---|
0:11:23 | we had |
---|
0:11:23 | similar distributions and shopping and restaurant possibly so we don't conversations each and thirteen hundred |
---|
0:11:29 | utterances |
---|
0:11:30 | and you can see on average there is |
---|
0:11:33 | more than one and stuff i mentioned in each utterance |
---|
0:11:39 | and then in terms of the actual updates |
---|
0:11:41 | intense themselves this is the distributions they can see in both domains |
---|
0:11:46 | at hand is domain and the most the most common updates |
---|
0:11:50 | followed by replace |
---|
0:11:52 | and |
---|
0:11:53 | and for shopping the increased by is noticeably different compared to the restaurant which an |
---|
0:12:01 | which you know so it's like twelve percent and chopping verses four percent and restaurant |
---|
0:12:08 | okay so then and terms expire in terms of experiments there are we implement that |
---|
0:12:12 | the improvement of the lstm in your us and optimizes |
---|
0:12:18 | it data with a |
---|
0:12:20 | in optimize enormous problem size is sixty four cross entropy loss |
---|
0:12:25 | so the embedding player was initialized with pre-trained glove embeddings on the upcoming crawled dataset |
---|
0:12:32 | and i'm missing words were initialized randomly |
---|
0:12:37 | and basically the evaluation was leave-one-out cross-validation where |
---|
0:12:41 | because the data was created with point eight in individual editors you they didn't ones |
---|
0:12:48 | i intra added to the evaluation can maybe this a manager will express the same |
---|
0:12:54 | to basically |
---|
0:12:55 | for a given a follow also without they would always trained on seven editors and |
---|
0:13:00 | test on the other atoms data |
---|
0:13:02 | and every time is also the average over all it follows and only also the |
---|
0:13:06 | parameter tuning on the drawing on learning rates |
---|
0:13:12 | and then they did it also have some baseline so |
---|
0:13:16 | to a simple n-gram baseline and based on a word window around the slot values |
---|
0:13:22 | context but it will logistic regression classifier |
---|
0:13:26 | but because of course that the remote will slot values an utterance that they have |
---|
0:13:30 | to decide which |
---|
0:13:32 | which of these slot values given you know words or n-grams belong to so i |
---|
0:13:37 | went to details but they basically had two approaches to this one with a headset |
---|
0:13:40 | segmentation which is basically a rule based approach to deciding |
---|
0:13:44 | which can slot value the word |
---|
0:13:48 | should be |
---|
0:13:49 | belong to or self segmentation which basically |
---|
0:13:53 | create an x basically paid for their |
---|
0:13:56 | you know basically for every ward it could be encounters being to the left to |
---|
0:14:02 | right to left well to the left and between two slot values at the right |
---|
0:14:07 | and between two slot so you basically increase the size of the feature representation and |
---|
0:14:12 | in another bit baseline was the phone level related events quantization |
---|
0:14:17 | the use of the |
---|
0:14:19 | classification results for the full model so |
---|
0:14:22 | i guess the key point here is that you can get pretty accurate f one |
---|
0:14:27 | here seeking an f one score over nineteen both domain and |
---|
0:14:32 | i'm for quite a few of the intense |
---|
0:14:35 | it can get over ninety percent have one so i think for both domains the |
---|
0:14:40 | most difficult |
---|
0:14:41 | one for some reason is remove |
---|
0:14:44 | and it's not i could be the case that you don't have enough training data |
---|
0:14:48 | of the older |
---|
0:14:49 | increased by and decrease by actually have less |
---|
0:14:52 | and |
---|
0:14:54 | and then to be compared to the baseline and probably on unsurprisingly the models that |
---|
0:15:00 | much that the model does much better than here the n-gram baseline and we can |
---|
0:15:05 | also see that the delexicalization helps a lot so and for restaurants a lesson to |
---|
0:15:11 | improve from eighty percent ninety percent |
---|
0:15:15 | i have one |
---|
0:15:16 | i don't and for a shopping from eighty four to ninety |
---|
0:15:21 | and |
---|
0:15:24 | okay and then in terms of a the cross domain generalization |
---|
0:15:28 | so just |
---|
0:15:29 | and some terminology so here they use the in the paper to use the |
---|
0:15:34 | i'm not in domain versus its domain |
---|
0:15:36 | and i'm basically i two settings one was just combined training what you just trying |
---|
0:15:41 | to combine combination of in domain and out-of-domain data |
---|
0:15:44 | and you do mostly retraining with fine tuning so they preach chain |
---|
0:15:48 | and yet domain data and then finetuned only on the |
---|
0:15:52 | is that a typo density function on union and domain data of a both settings |
---|
0:15:57 | they vary the percentage of in the in domain |
---|
0:16:00 | it was you selected in show a core and the rest |
---|
0:16:04 | so here's the ear results when a restaurant was the ads domain and it shop |
---|
0:16:09 | was the target domain |
---|
0:16:11 | so the green is what happened if you only train an in-domain data |
---|
0:16:16 | and i think is if you use a pre-training approach and |
---|
0:16:21 | that is a combined training so you can see actually with zero in domain data |
---|
0:16:25 | there are added in pretty well like we just upgraded percents |
---|
0:16:29 | versus like mid ninety one is being the optimal |
---|
0:16:32 | and you can get pretty good |
---|
0:16:35 | like close to optimal results but only twenty percent of in domain data |
---|
0:16:40 | and when we got the opposite way the results are still pretty encouraging model are |
---|
0:16:44 | quite as good so what zero in domain data |
---|
0:16:47 | and in the f one is only seventy percent |
---|
0:16:51 | so in it seems to me at least act |
---|
0:16:55 | this suggests that we measure the restaurants data may be richer and more very so |
---|
0:17:01 | and i |
---|
0:17:03 | training on the simple case that is just not |
---|
0:17:06 | transferring as well |
---|
0:17:08 | and |
---|
0:17:09 | okay so and okay so |
---|
0:17:11 | conclusions basically they propose a new type of slot-specific user intends |
---|
0:17:16 | these user intents and addresses user intent containing |
---|
0:17:20 | the implicit niles numerical update some preferences |
---|
0:17:24 | for slot values |
---|
0:17:26 | and the present it is sequence labeling model for classifying update intents |
---|
0:17:31 | and also propose a method for transfer their learning across domains |
---|
0:17:36 | and then also showed strong classification performance in this task a promising domain independent results |
---|
0:17:43 | and future they plan to incorporate a pay attention to real dialogues |
---|
0:17:49 | state tracking |
---|
0:17:50 | and so |
---|
0:17:52 | i'm not in order but i can try to answer some questions for say especially |
---|
0:17:56 | if they're clarifications question type questions "'cause" i have last |
---|
0:18:01 | also has a lot questions with this myself |
---|
0:18:03 | is not if i can also or anything |
---|
0:18:06 | this is the first two words are this email addresses |
---|
0:18:53 | not sure microsoft something very ones especially i don't see how you know you could |
---|
0:19:00 | just replace the nlu zero because you have four people use a task to i |
---|
0:19:05 | can use like the number six from the nlu there |
---|
0:19:28 | sure i mean that's only sense added to model so |
---|
0:19:31 | so i like this eight minutes of a so a question but to me to |
---|
0:19:34 | access more and more difficult where frame |
---|
0:19:39 | i |
---|
0:19:58 | i |
---|
0:19:59 | i have a question myself but i was only thinking on the last night so |
---|
0:20:02 | as to write it was too late to ask the authors if they have available |
---|
0:20:06 | but it's a quite it's something i call to me as well |
---|
0:20:10 | will be interesting to see what exactly is been confused |
---|
0:20:44 | i don't i mean it's so i guess |
---|
0:20:48 | i i'm not sure answer the question i guess this causes two steps the annotation |
---|
0:20:52 | one is created dialogues and you're is actually annotating the weights the slot means values |
---|
0:20:58 | and intent and so i guess the second part you could get inter annotator agreement |
---|
0:21:01 | for it on the cue cards of the source but i don't i don't believe |
---|
0:21:05 | they are |
---|
0:21:07 | they try to cover on it so agreement |
---|
0:21:09 | i mean that the fact that |
---|
0:21:12 | they can get ninety percent f one suggesting that the labels can be too noisy |
---|
0:21:18 | because of their very noisy would be hard to be accurate like |
---|
0:21:23 | that's not the same as that of course explicitly measure |
---|