Speech Transcript - Identifying Domain Independent Update Intents in Task Based Dialogs

well in a single parent i'm not actually when we also this paper so this

paper by properly aren't each element area of this

should be with

all right i do not work we don although the force also tracker are the

weather

this we will use that also just have some background the motivation

and i don't talk american what we mean by his of the inference

and but not all introduce the problem statement and have a model and tell her

and deal with cross domain generalization

and then introduce a data experiments and an

of and conclusions and future work

so in terms of the background in

so as the main use of these days of c critical to

successful completions of tasks and task the stylus

and usability is

braries expresses the probability distribution over goals which are represented as state value pairs

and i'm typically state tracking or tracking approaches use dialogue acts to infer user intentions

towards the slot values that have been detected and typical dialogue acts would be inform

deny in four

requests negates so an example utterance here finally french restaurants

in boston a

example slu output would be informed crazy is french

okay she got city equals lost or not

and

basically in

the idea of this work it is our motivation of this work is guy recordings

dialogue acts are always not adequately capture the user intends to words

so far is in the bookcases

so one example is implicit denial so in this example here you know if the

user invites by john

and joe for dinner

and then se stage o often is so i think here that this is in

place of because it doesn't cars onto it in ir in the case

dialogue act

and expect so here we have the

we have the

or comments on the left and you expected slu a for a typical system on

the right

and to another limitation can be expressed and preferences for slot values and this is

specifically to i think to the space slot value so in this example we asked

to read french restaurants in los and not that

the second order and says

finally some in san jose to disordered finally by singular right instead so current slu

and expresses dialogue acts wouldn't

distinguish between the second and sororities whereas the intent is of see quite different in

that in that sort instances basically express the preference for your right which would imply

to replace

da replace go well as currently in the state we can write i'm

in the second instance you just wanted and it's

and then i don't know in the limitation is that

it doesn't deal well we numerical update if you just one incrementality command so in

this example you know you ask for table four formant then you might say a

four more seats to lose more seats and the expected a popcorn systems would deal

with tiger

and

so the solution is in the solution that we propose that the authors proposed

is it okay intense which basically describes a new semantic class of intense directly

hi to the update you get a user intends

and so here's the list of in intense so

the first one is apparent so basically the user specifies that

and value or multiple value

for multi valued slots so it's basically equivalent of attending to specify values to motivate

yourself for remove

and basically it's the

the complement of that basically to remove the value and from what about a slot

replace

expresses the press a reference for the slot so basically

expresses and evaluate that to be preferred over previous value six enter means replace existing

value and then have increased by and decrease by which are specific to numeric or

types

i am i'm here some examples so

so what we have here is an utterance and hundred

very conventional slu and then satirical not intense

and so for example we had earlier of tape show off the latest this would

maybe common informative when a sequence joe

and for restaurant search then we see their data and for examples of find someone

somehow say to this would be informed up and whereas find me in gilroy instead

become an informer place

and

and then for data

numerical examples

and

for a for more seats the be common form increased by four

it can you move to see what and for increased by

two

and okay so environment and how we formulate this problem basically

you have any user utterance a identify the intense all the slot values mentioned in

it so the impose user utterance that's alright intact with slots and values

and here what is update intense for

and these five classes for all slots are so here we see two examples drop

one person wearing

number i guess is the slot name and one is the stuff hourly and

updated decreased by and you're example joe can make it so and people names is

the slot name and joseph hourly and the update intent here is to remove the

and we formulate this is a multiclass classification of slot

a second to five okay ten classes

so modeling here is

sequence labeling problem with a bidirectional lstm

and

so the user utterances a sequence of slot values of tokens

and i'm labels

basically the user intents or words that analysis slot don't responses slot values the labels

just generic token and we also do so the effects of quantization of slot filling

so this is what it looks like

and so on the bottom we have the input okay forget sunnyvale try to put

you know instead the for sick reduced delexicalise that so we have a slot value

we basically delexicalise it to the slot name which has been shown

previous work to generalize better

limited training data and the slot values themselves you know maybe a vocabulary in the

training data

and then we have a the embedding layer and then basically

a typical bidirectional lstm i don't finally we have a softmax layer and we predict

the target so you can see in this example basically okay forget are tokens

so you very little or location you'll content words that the and for computing you

know you know the intense words that replace

and

the actual realization was silent is helpful when in generalizing to slot values not seen

in the training data well it's

but only really with it within a single domain so in cross want to go

cross-domain the slot names maybe difference in you may see slot names in

the target domain that didn't exist in the source domain

and however different domain should if we can group slot names in two types different

domains should share the same types of slots

so and as an example restaurant reservation and online shopping domains have numbering gas certain

number grocery items is about numeric five so we can we can relax if we

can be like spoken about the lexicalised to this

in high and we may be able to generalize

so the solution is that the lexical items like five

so these are the three slot-types to within the final maybe memorex

so slots which become really increase and decrease

and we've two

types of multi valued slots this junk those slots which can

take multiple values in this junction solidworks or so and there is an example of

was lost

or not

i'm counterfeit which communist and can take multiple values

in conjunction syllables than in the names of people going to the items in shopping

list

okay so much

two

to evaluate this and what we were acquired from a dataset was dialogs containing

you may rate can control them in this show the multi valued slots

in the domain ontology allow the weights you know annotations for the proposed update express

and

so i'm basically an existing data sets and didn't have all of these

so the also screen the wrong data set

and it does basically they talk to domains or restaurants in online shopping

and had eight different professional editors generate conversations

and in these domains and so that it the basically asked craig our conversations corresponding

to the task the task would be search for a restaurant

make it in or booking by

groceries by close

and they were told to assume appropriate part responses cover not require building an end-to-end

system here

and basically don't have a button to the czech generated a were annotated with slot

names and the update intense

i just as a reminder this is what you eight essentially look like you have

the utterance

it's annotated with the

with the slot name the slot value which would be impulse the system

and the update intent will be detected but

and this is the

this for the restaurant and shopping domain this is the list the slot and names

under types so we have been a participant names number of gas menu items cuisine

and location restaurants

and grossly items quantity of roast or operate items colour and size for shopping

i mean you can see that although

the this out of it it's not names are disjoint they stiff share the same

and slot right

okay so after the data was greater than at a this is what the distribution

looks like so

we had

similar distributions and shopping and restaurant possibly so we don't conversations each and thirteen hundred

utterances

and you can see on average there is

more than one and stuff i mentioned in each utterance

and then in terms of the actual updates

intense themselves this is the distributions they can see in both domains

at hand is domain and the most the most common updates

followed by replace

and

and for shopping the increased by is noticeably different compared to the restaurant which an

which you know so it's like twelve percent and chopping verses four percent and restaurant

okay so then and terms expire in terms of experiments there are we implement that

the improvement of the lstm in your us and optimizes

it data with a

in optimize enormous problem size is sixty four cross entropy loss

so the embedding player was initialized with pre-trained glove embeddings on the upcoming crawled dataset

and i'm missing words were initialized randomly

and basically the evaluation was leave-one-out cross-validation where

because the data was created with point eight in individual editors you they didn't ones

i intra added to the evaluation can maybe this a manager will express the same

to basically

for a given a follow also without they would always trained on seven editors and

test on the other atoms data

and every time is also the average over all it follows and only also the

parameter tuning on the drawing on learning rates

and then they did it also have some baseline so

to a simple n-gram baseline and based on a word window around the slot values

context but it will logistic regression classifier

but because of course that the remote will slot values an utterance that they have

to decide which

which of these slot values given you know words or n-grams belong to so i

went to details but they basically had two approaches to this one with a headset

segmentation which is basically a rule based approach to deciding

which can slot value the word

should be

belong to or self segmentation which basically

create an x basically paid for their

you know basically for every ward it could be encounters being to the left to

right to left well to the left and between two slot values at the right

and between two slot so you basically increase the size of the feature representation and

in another bit baseline was the phone level related events quantization

the use of the

classification results for the full model so

i guess the key point here is that you can get pretty accurate f one

here seeking an f one score over nineteen both domain and

i'm for quite a few of the intense

it can get over ninety percent have one so i think for both domains the

most difficult

one for some reason is remove

and it's not i could be the case that you don't have enough training data

of the older

increased by and decrease by actually have less

and

and then to be compared to the baseline and probably on unsurprisingly the models that

much that the model does much better than here the n-gram baseline and we can

also see that the delexicalization helps a lot so and for restaurants a lesson to

improve from eighty percent ninety percent

i have one

i don't and for a shopping from eighty four to ninety

and

okay and then in terms of a the cross domain generalization

so just

and some terminology so here they use the in the paper to use the

i'm not in domain versus its domain

and i'm basically i two settings one was just combined training what you just trying

to combine combination of in domain and out-of-domain data

and you do mostly retraining with fine tuning so they preach chain

and yet domain data and then finetuned only on the

is that a typo density function on union and domain data of a both settings

they vary the percentage of in the in domain

it was you selected in show a core and the rest

so here's the ear results when a restaurant was the ads domain and it shop

was the target domain

so the green is what happened if you only train an in-domain data

and i think is if you use a pre-training approach and

that is a combined training so you can see actually with zero in domain data

there are added in pretty well like we just upgraded percents

versus like mid ninety one is being the optimal

and you can get pretty good

like close to optimal results but only twenty percent of in domain data

and when we got the opposite way the results are still pretty encouraging model are

quite as good so what zero in domain data

and in the f one is only seventy percent

so in it seems to me at least act

this suggests that we measure the restaurants data may be richer and more very so

and i

training on the simple case that is just not

transferring as well

and

okay so and okay so

conclusions basically they propose a new type of slot-specific user intends

these user intents and addresses user intent containing

the implicit niles numerical update some preferences

for slot values

and the present it is sequence labeling model for classifying update intents

and also propose a method for transfer their learning across domains

and then also showed strong classification performance in this task a promising domain independent results

and future they plan to incorporate a pay attention to real dialogues

state tracking

and so

i'm not in order but i can try to answer some questions for say especially

if they're clarifications question type questions "'cause" i have last

also has a lot questions with this myself

is not if i can also or anything

this is the first two words are this email addresses

not sure microsoft something very ones especially i don't see how you know you could

just replace the nlu zero because you have four people use a task to i

can use like the number six from the nlu there

sure i mean that's only sense added to model so

so i like this eight minutes of a so a question but to me to

access more and more difficult where frame

i have a question myself but i was only thinking on the last night so

as to write it was too late to ask the authors if they have available

but it's a quite it's something i call to me as well

will be interesting to see what exactly is been confused

i don't i mean it's so i guess

i i'm not sure answer the question i guess this causes two steps the annotation

one is created dialogues and you're is actually annotating the weights the slot means values

and intent and so i guess the second part you could get inter annotator agreement

for it on the cue cards of the source but i don't i don't believe

they are

they try to cover on it so agreement

i mean that the fact that

they can get ninety percent f one suggesting that the labels can be too noisy

because of their very noisy would be hard to be accurate like

that's not the same as that of course explicitly measure

Identifying Domain Independent Update Intents in Task Based Dialogs

Oral Session 5: State Tracking

Prakhar Biyani, Cem Akkaya, Kostas Tsioutsiouliklis