okay so the last

speaker in this session is play issue

and the she's going to present a flexibly structured models for task oriented dialogues so

another end-to-end dialog model

so

go ahead trees

and you're not everyone on relation for university of illinois at chicago our present our

work flexible structured task oriented dialogue modeling for short addressed the

this work at all my me pair molly no multi

who shoe being deal

why children and spoken for

lattice quick reply recap module it end-to-end dialog systems

traditional modularised dialogue system at the pipeline of natural language understanding dialog state tracking knowledge

base squarey

that a dialogue policy engine and natural language generation

and you and that of system connect all these motors together and the chain them

together with detecting and text out

the advantage of and you and fashion you that it can reduce the error propagation

dialog state tracking the key module which understanding user intentions

track dialog history and update dialog state at every turn

the update of dialogue state get used for carrying the knowledge base and a for

policy engine and for response generations

there are two popular approaches week or them fully structured approach and a freeform approach

the following doctrine approach uses the full structure of the knowledge base

both it's schema

and that the values

it as you don't that

the set of informable slot values and the requestable slots are fixed

the network about it's multiclass classification

the advantages that value and the slot are well aligned

the disadvantage in that it can not adapted dynamic knowledge base and detect out-ofvocabulary values

appeared user's utterance

the freefall approach does not exploit and information

a pause the knowledge base

in the model architecture

it achieves the dialog state as a sequence of informal values and the requestable slots

for example in the picture

in the restaurant domain

that dialog state it's

italian then we call an cheap then we call them

address then we call an and a full

the network it's sequences sequence

the pros i that

it can adapt to new domains

and that the changes in the content of knowledge base

it is stopped out-of-vocabulary problems

the disadvantage is that

value and the slot

and not aligned

for example

in travel booking system

given a

dialog state chicago and that's the other

can you tell

what you that departure city and the which when it's a rival city

and also

tough free from approach which model unwanted order of requestable slots and it can produce

in many states

that may be generated and non requestable slot words

so our proposed yet

flexible structured dialogue models

the contents fine components

the first it the queen hard

the queen hardly at all we encoded in core encoder module

and the yellow and orange part of our dialog state tracking

the purple part of its knowledge base query

the red part it's all a new module we propose yet call the response lot

decoders

and the green and of the we and that the blue part well together would

be the response generation

so we propose a flexible subject turn dialog state tracking

approach

what you use only the information in the schema

of the knowledge base but not to use the information about the values

the architecture we propose contains two parts

informable slot value decoder the yellow in this pictures

and the requestable slot decoder the already part

the informable slot value decoder has separate decoder to each informable slot

for example in this picture

what is for that right

given the start of standard token foot

the decoder generate italian and of food

for the requestable slot decoder idiot a multi-label classifier for requestable slots

or you can think that

binary classification given a requestable slot

you can see that inflexible structured approach has a lot of advantage first slot and

the values are aligned

it also solves all the vocabulary problem

and the k usually at that between your domains and of the changes of the

content of knowledge base because we are using a generation method for the informable value

decoder

and also we remove the unwanted order of the requestable slots and that the channel

to generate invalid the states

a nice the flexible subject read dialog state tracking it's

it can explicitly

a design value to slots

like the fully structured approach

why are also preserving the capability of dealing with all the vocabulary

like the freefall approach

meanwhile it ring challenges in response generation

the first challenge is that

the it possible to improve the response generation quality based i'll flexible structured dst

the second challenge is that

how to incorporate the output for a flexible subject or dst

for response generation

so regarding the first challenge

how to improve the response generation we propose a novel module called the response large

decoder

the writing to pick the right part in the pictures

the response slots

decoders

of the response slots

i don't slot names or the slot tokens

that appear in that you lexicalised the response

for example

the user request the address

the system replies

the address

often am slot

it in i just thought

so for the response lot colder we also adopt a multi-label classifier

regarding the stacking the challenge

how to incorporate

flexible subject or

the st

for the rest both generations

we propose toward a copy distributions

it will increase the chance of awards

in the informable slot values

requestable slots and the response lot to appear in the agent response

for example

the address of an m slot get e

i had just a lot so we are trying to increase the channels off

address

name slot and at a slot to appear in the response

it'll from now i'm going to go to detail how we link these modules together

first it always input encoders

i like input encoder

takes so you kind of input

the first get agents right well in the pastor

the second it that dialog state

and this sort yet the current the user's utterance

the out the were p

the last hidden state of the encoder

it was first asked initial hidden state

what the dialog state tracker and that the response generation

informable slot about a decoder gets one part of our flexible structure dst

it has to kind of input

the input e at last the hidden states from the encoders

and that the unique start of sentence syllables for each slot

for example

for the slot starting word gets food

the output

for each slot

a sequence of words regarding the slot values are generated

for example

the value generated of all for the slot here

italian

and awful

the intuition here is that

the unique start of sentencing both issuers

the slot and the value alignment

and that the complement can it then a command sequences sequence allows copying of values

directly from the encoder input

the requestable slot binary classifier

this is the another part in our d

flexible structure to dst

the you what is that

last hidden state of the encoder

unique start of send the symbols for each slot

for example

for the slot starting a war it also for

the also forty it's

for each slot

a binary prediction

true or false

the produced regarding whether the slot it is requested by the user or not

note that

but you are you here i guess only one step

it may be replaced that with any classification high key picture you want like

which uses you are good because we want to use the hidden state here

at the initial state for our response slot binary classifier

what the knowledge base acquire a get takes the in the generated informable slot values

and of the knowledge base and output

well how the vector represents the number of record the matched

he i get our response slot binary classifier

if the input es

the knowledge base par with a result

the hidden state from the requestable slot binary classifier

output yet

for each response plot a binary prediction

true or false

if the produced regarding whether it is response not appear in the asian the response

or not

the motivation is that

incorporating all it really relevant information about the retrieved entities

and that the requested slots into the response

our

copy what a word a copy distribution can use them

the motivation here is that

the canonical copy

mechanic then only takes a sequence of words in text input

but not accept

the multi porno distribution we obtain

from the binary classifiers

so we taking

the prediction from the informable slot the value decoders

and that from the requestable slot binary classifier and the response slot binary classifier

and output a word distribution

so

if a word yet a requestable slot or a response not

the probability of the a binary classifier output

if a word appears in the generated informable slot values

if the probability equal to one

four or other words in there

a interest about decoder

what taking that encode

the last hidden state of the encoders

and the knowledge base carried a result

and that the word a copy distributions

all support get a delexicalised agent response

the overall loss for the whole network what including the informable slot values

so loss and of the requestable slot values last response slot values most and that

the agent a response slot values but a gender is the boss loss

experimental settings

we use to kind of the that

the cambridge restaurant dataset and the stand for in-car assistant there is that

and the evaluation matches we use

for the dialog state tracking we report the

we report the precision recall and f-score four informable slot values and requestable smarts

and of what have completion

we use the and you match rate and the success f one score

and the blue yet apply to degenerated agent response for evaluating the language quality

we compare our method to these baselines

and em

and id and their functional ones

they using the fully structured approach what

for the dialog state tracking is

and the kb are in from the stand for

they do not think that they do not do that dialog state tracking

and that est p

and the t st p

without are your and ts tp the other freefall approaches

they use a two-stage copy and could be mccain didn't sequence of sequence

which kaldi software encoders and the true copy mechanic simple commanded decoders

to decode belief state first and then the response generation as

and of for the for its ep and also tuning

the response slot by the reinforcement learning

here the turn dialogue dialog state tracking results

you are notice that

our proposed the method fst in it performs much better than the free for approach

jesse p especially

the

especially on the requestable slot the reason is that

the free for approach that modeled the unwanted order

of the requestable slots

so that why hall or of f is the uncanny can perform better than them

this it our that of the level task completion without

you also notice that fst and can perform better than most

better than the baseline in models to match

you most the metrics that the blue on the kb it dataset

here it example of generated dialog state and the response from the free for approach

and all approach

in the calendars

okay

the belief state at the want to choose a belief that here is that for

the informable slot the you've and easy crow to the meeting and for the requestable

slot the user try to be acquired state

time and parity

the freefall approach it would generate meeting data and a party an ofdm would generate

the you've and the crow to them at a meeting data it to time it

to an a party it's true

you a notice that here the free for approach cannot generate the time

the time here the really that in the training dataset

the down a lot of example

contain data in the parties so they modeled disc the free one approach you model

it is kind of orders

so the mammoth right data in party together so when during the testing

the it

what during the testing if the user request that date time party it cannot predict

that the it cannot predict about the problem

and also for that

begin the response

the

one shows the it's your anatomy of the way it's

parties slot on that there is not a time slot the t a cp generate

the next meeting at that time slot on days not and the time slot and

i'll have sdm can generate

and maybe

a baseline at a time slot with part is not here the freedom approach can

generate system with the us and repeating this at the time slot

the conclusion here that we propose an island to an architecture with a flexible structure

model

for the task oriented dialogues

and the experiment

suggest that the architecture get competitive with these us assume top models and the wire

our model can be apply applicable you real world scenarios

our code will be available in the next few weeks on this links

and is it another when you work regarding the model be multi action policy what

task oriented dialogs it will appear mlp tucson it

the pre and the code are publicly accessible on this link all you can see

again the cure a cold

the traditional policy engine predicts what action per term which were limited express upon work

and introduce unwanted terms but interactions

so

we propose to generate monte action per turn by generating a sequence of tuple the

tuple units continue act and the smart

the continue here means well that we are going to stop generating just tuples all

we are going to continue to generate the couples the slot to me the accuracy

of the dialogue act and the slots media a does not carry it's the it's

not like a movie name

we propose a novel recurrent zero

called the data continues that's not g c is

which contains two units

continue you need act you need and the smallest unit

and it sequentially-connected in this recurrent is there

so the whole decoder yet in a recurrent of recurrent a fashions

we would like to deliver a special thanks to alex janice woman maps and the

stick their reviewers thank you

thank you very much for the talk

so are there any questions okay or in the back

i thank you very much that was very interesting

so what the system do if somebody didn't respond with a slot name or a

slot value

you know what time you what restaurant you want you that it is that the

closest one to the form theatre

excuse me to repeat the lessons again

your system prompts somebody for a restaurant where they want to eat you money that

some italian food the system says what restaurant would you like to eat at and

the user says the closest italian restaurant to the form theatre

so i'm not giving you a slot value i'm giving you a constraint on the

slot value

what this kind of an architecture do with something like that is a response okay

thank you a generate a

that

does not the menus provided user to what we are working for most of the

values were detected

so when we gent

the always informable slot value decoder

informable slot the melody currently decoder were trying to catch these the end use these

informations from the user side so when we are trying to generated is kind of

each things we are also well using the copy

we also are trying to increase these words to be appeared in the response generation

is for example

the titanium at the italian restaurant or you want to what b

a someplace this method that

i understand how you do that but the question is how would you get the

act to what the wrapper internal representation be somehow that we get the closest to

get the superlative

in the result how what if compute the closest of all you're doing is attending

to values that you have to compute some function like instance

actually at the very but the question i think that

it is them i'm getting you are trying to ask you whether if the

have to informable slot the values from the user the is not exactly match is

something that appear in the knowledge base

it is that strike not trying to i'm saying the user doesn't know what's in

the knowledge base it's just saying whatever is the closest one you tell me

okay the closest the one for example you get it will also be something like

it will be something like for the area slot values actually this kind of situation

our current a model cannot handle and or on the past work cannot handle because

it and it is not actually appeared in the dataset we are using

right thank you

any other questions

okay in that case i'd like to the collection

i notice that you were evaluating your model on two datasets the cambridge restaurant and

the key v read and i was wondering with wouldn't be or how difficult would

it be to extend them all to work on the multi walls dataset which is

you know bigger than those two and as more domains and

actually the very good questions

actually in the

in the for the for the most you want us that being the latest ecr

conference to trader network that is trying to do it

then updated it into do that they were showed that of the cherokee use the

system in a kind of all a similar kind of techniques

using different as that of sentence si models

two different start of than the steamboat to generated are the values

so i think that so we did a the our work kind of kind of

prove that

that's flexible started at the phonetic symbols structured the entity can be applied on the

multi award part

and the for the response generation part we believe that of the we believe that

our proposed the copy word like anything can also work

okay so basically you think that just retraining should it's should be sufficient i think

okay thanks okay any other question

it then i guess i have one more

and there was

basically when you when you were showing the us a lot response model or responsible

decoder that was the

i mean

and i and you said that you have like once the gru

ones that what as it exactly mean or weight is there like and one gru

cell that is

yes with a good

kind of using the gru zero but we do not using it the recurrent a

later

right and the output is like a

one hearts encoding of the slots to be inserted in the response or is it

some kind of embedding

here it's a it depending on but also put his to the whole body it

sure for small we can thing yet

distribution from that there'll where right okay so that's why or what a couple or

what a copy distribution what using this kind of zero to one values and the

probability that we decide whether this

the to increase this words channels appear in the in the agent response

right okay thank you very much thank you

alright so what's thank the speaker again