well in a single parent i'm not actually when we also this paper so this
paper by properly aren't each element area of this
should be with
all right i do not work we don although the force also tracker are the
weather
this we will use that also just have some background the motivation
and i don't talk american what we mean by his of the inference
and but not all introduce the problem statement and have a model and tell her
we
and deal with cross domain generalization
and then introduce a data experiments and an
of and conclusions and future work
so in terms of the background in
so as the main use of these days of c critical to
successful completions of tasks and task the stylus
and usability is
braries expresses the probability distribution over goals which are represented as state value pairs
and i'm typically state tracking or tracking approaches use dialogue acts to infer user intentions
towards the slot values that have been detected and typical dialogue acts would be inform
deny in four
requests negates so an example utterance here finally french restaurants
in boston a
example slu output would be informed crazy is french
okay she got city equals lost or not
and
so
basically in
the idea of this work it is our motivation of this work is guy recordings
dialogue acts are always not adequately capture the user intends to words
so far is in the bookcases
so one example is implicit denial so in this example here you know if the
user invites by john
and joe for dinner
and then se stage o often is so i think here that this is in
place of because it doesn't cars onto it in ir in the case
dialogue act
and expect so here we have the
we have the
or comments on the left and you expected slu a for a typical system on
the right
and to another limitation can be expressed and preferences for slot values and this is
specifically to i think to the space slot value so in this example we asked
to read french restaurants in los and not that
the second order and says
finally some in san jose to disordered finally by singular right instead so current slu
and expresses dialogue acts wouldn't
distinguish between the second and sororities whereas the intent is of see quite different in
that in that sort instances basically express the preference for your right which would imply
to replace
da replace go well as currently in the state we can write i'm
in the second instance you just wanted and it's
and then i don't know in the limitation is that
it doesn't deal well we numerical update if you just one incrementality command so in
this example you know you ask for table four formant then you might say a
four more seats to lose more seats and the expected a popcorn systems would deal
with tiger
and
so the solution is in the solution that we propose that the authors proposed
is it okay intense which basically describes a new semantic class of intense directly
hi to the update you get a user intends
and so here's the list of in intense so
the first one is apparent so basically the user specifies that
and value or multiple value
for multi valued slots so it's basically equivalent of attending to specify values to motivate
yourself for remove
and basically it's the
the complement of that basically to remove the value and from what about a slot
replace
expresses the press a reference for the slot so basically
it
expresses and evaluate that to be preferred over previous value six enter means replace existing
value and then have increased by and decrease by which are specific to numeric or
types
i am i'm here some examples so
so what we have here is an utterance and hundred
very conventional slu and then satirical not intense
and so for example we had earlier of tape show off the latest this would
maybe common informative when a sequence joe
and for restaurant search then we see their data and for examples of find someone
somehow say to this would be informed up and whereas find me in gilroy instead
become an informer place
and
and then for data
numerical examples
and
for a for more seats the be common form increased by four
it can you move to see what and for increased by
two
and okay so environment and how we formulate this problem basically
you have any user utterance a identify the intense all the slot values mentioned in
it so the impose user utterance that's alright intact with slots and values
and here what is update intense for
and these five classes for all slots are so here we see two examples drop
one person wearing
number i guess is the slot name and one is the stuff hourly and
updated decreased by and you're example joe can make it so and people names is
the slot name and joseph hourly and the update intent here is to remove the
and we formulate this is a multiclass classification of slot
a second to five okay ten classes
so modeling here is
sequence labeling problem with a bidirectional lstm
and
so the user utterances a sequence of slot values of tokens
and i'm labels
basically the user intents or words that analysis slot don't responses slot values the labels
just generic token and we also do so the effects of quantization of slot filling
so this is what it looks like
and so on the bottom we have the input okay forget sunnyvale try to put
you know instead the for sick reduced delexicalise that so we have a slot value
we basically delexicalise it to the slot name which has been shown
previous work to generalize better
limited training data and the slot values themselves you know maybe a vocabulary in the
training data
and then we have a the embedding layer and then basically
a typical bidirectional lstm i don't finally we have a softmax layer and we predict
the target so you can see in this example basically okay forget are tokens
so you very little or location you'll content words that the and for computing you
know you know the intense words that replace
and
so
the actual realization was silent is helpful when in generalizing to slot values not seen
in the training data well it's
but only really with it within a single domain so in cross want to go
cross-domain the slot names maybe difference in you may see slot names in
the target domain that didn't exist in the source domain
and however different domain should if we can group slot names in two types different
domains should share the same types of slots
so and as an example restaurant reservation and online shopping domains have numbering gas certain
number grocery items is about numeric five so we can we can relax if we
can be like spoken about the lexicalised to this
in high and we may be able to generalize
so the solution is that the lexical items like five
so these are the three slot-types to within the final maybe memorex
so slots which become really increase and decrease
and we've two
types of multi valued slots this junk those slots which can
take multiple values in this junction solidworks or so and there is an example of
was lost
or not
i'm counterfeit which communist and can take multiple values
in conjunction syllables than in the names of people going to the items in shopping
list
okay so much
two
to evaluate this and what we were acquired from a dataset was dialogs containing
you may rate can control them in this show the multi valued slots
in the domain ontology allow the weights you know annotations for the proposed update express
and
so i'm basically an existing data sets and didn't have all of these
so the also screen the wrong data set
and it does basically they talk to domains or restaurants in online shopping
and had eight different professional editors generate conversations
and in these domains and so that it the basically asked craig our conversations corresponding
to the task the task would be search for a restaurant
make it in or booking by
groceries by close
and they were told to assume appropriate part responses cover not require building an end-to-end
system here
and basically don't have a button to the czech generated a were annotated with slot
names and the update intense
i just as a reminder this is what you eight essentially look like you have
the utterance
it's annotated with the
with the slot name the slot value which would be impulse the system
and the update intent will be detected but
and this is the
this for the restaurant and shopping domain this is the list the slot and names
under types so we have been a participant names number of gas menu items cuisine
and location restaurants
and grossly items quantity of roast or operate items colour and size for shopping
i mean you can see that although
the this out of it it's not names are disjoint they stiff share the same
and slot right
okay so after the data was greater than at a this is what the distribution
looks like so
we had
similar distributions and shopping and restaurant possibly so we don't conversations each and thirteen hundred
utterances
and you can see on average there is
more than one and stuff i mentioned in each utterance
and then in terms of the actual updates
intense themselves this is the distributions they can see in both domains
at hand is domain and the most the most common updates
followed by replace
and
and for shopping the increased by is noticeably different compared to the restaurant which an
which you know so it's like twelve percent and chopping verses four percent and restaurant
okay so then and terms expire in terms of experiments there are we implement that
the improvement of the lstm in your us and optimizes
it data with a
in optimize enormous problem size is sixty four cross entropy loss
so the embedding player was initialized with pre-trained glove embeddings on the upcoming crawled dataset
and i'm missing words were initialized randomly
and basically the evaluation was leave-one-out cross-validation where
because the data was created with point eight in individual editors you they didn't ones
i intra added to the evaluation can maybe this a manager will express the same
to basically
for a given a follow also without they would always trained on seven editors and
test on the other atoms data
and every time is also the average over all it follows and only also the
parameter tuning on the drawing on learning rates
and then they did it also have some baseline so
to a simple n-gram baseline and based on a word window around the slot values
context but it will logistic regression classifier
but because of course that the remote will slot values an utterance that they have
to decide which
which of these slot values given you know words or n-grams belong to so i
went to details but they basically had two approaches to this one with a headset
segmentation which is basically a rule based approach to deciding
which can slot value the word
should be
belong to or self segmentation which basically
create an x basically paid for their
you know basically for every ward it could be encounters being to the left to
right to left well to the left and between two slot values at the right
and between two slot so you basically increase the size of the feature representation and
in another bit baseline was the phone level related events quantization
the use of the
classification results for the full model so
i guess the key point here is that you can get pretty accurate f one
here seeking an f one score over nineteen both domain and
i'm for quite a few of the intense
it can get over ninety percent have one so i think for both domains the
most difficult
one for some reason is remove
and it's not i could be the case that you don't have enough training data
of the older
increased by and decrease by actually have less
and
and then to be compared to the baseline and probably on unsurprisingly the models that
much that the model does much better than here the n-gram baseline and we can
also see that the delexicalization helps a lot so and for restaurants a lesson to
improve from eighty percent ninety percent
i have one
i don't and for a shopping from eighty four to ninety
and
okay and then in terms of a the cross domain generalization
so just
and some terminology so here they use the in the paper to use the
i'm not in domain versus its domain
and i'm basically i two settings one was just combined training what you just trying
to combine combination of in domain and out-of-domain data
and you do mostly retraining with fine tuning so they preach chain
and yet domain data and then finetuned only on the
is that a typo density function on union and domain data of a both settings
they vary the percentage of in the in domain
it was you selected in show a core and the rest
so here's the ear results when a restaurant was the ads domain and it shop
was the target domain
so the green is what happened if you only train an in-domain data
and i think is if you use a pre-training approach and
that is a combined training so you can see actually with zero in domain data
there are added in pretty well like we just upgraded percents
versus like mid ninety one is being the optimal
and you can get pretty good
like close to optimal results but only twenty percent of in domain data
and when we got the opposite way the results are still pretty encouraging model are
quite as good so what zero in domain data
and in the f one is only seventy percent
so in it seems to me at least act
this suggests that we measure the restaurants data may be richer and more very so
and i
training on the simple case that is just not
transferring as well
and
okay so and okay so
conclusions basically they propose a new type of slot-specific user intends
these user intents and addresses user intent containing
the implicit niles numerical update some preferences
for slot values
and the present it is sequence labeling model for classifying update intents
and also propose a method for transfer their learning across domains
and then also showed strong classification performance in this task a promising domain independent results
and future they plan to incorporate a pay attention to real dialogues
state tracking
and so
i'm not in order but i can try to answer some questions for say especially
if they're clarifications question type questions "'cause" i have last
also has a lot questions with this myself
is not if i can also or anything
this is the first two words are this email addresses
not sure microsoft something very ones especially i don't see how you know you could
just replace the nlu zero because you have four people use a task to i
can use like the number six from the nlu there
sure i mean that's only sense added to model so
so i like this eight minutes of a so a question but to me to
access more and more difficult where frame
i
i
i have a question myself but i was only thinking on the last night so
as to write it was too late to ask the authors if they have available
but it's a quite it's something i call to me as well
will be interesting to see what exactly is been confused
i don't i mean it's so i guess
i i'm not sure answer the question i guess this causes two steps the annotation
one is created dialogues and you're is actually annotating the weights the slot means values
and intent and so i guess the second part you could get inter annotator agreement
for it on the cue cards of the source but i don't i don't believe
they are
they try to cover on it so agreement
i mean that the fact that
they can get ninety percent f one suggesting that the labels can be too noisy
because of their very noisy would be hard to be accurate like
that's not the same as that of course explicitly measure