okay so the last
speaker in this session is play issue
and the she's going to present a flexibly structured models for task oriented dialogues so
another end-to-end dialog model
so
go ahead trees
and you're not everyone on relation for university of illinois at chicago our present our
work flexible structured task oriented dialogue modeling for short addressed the
this work at all my me pair molly no multi
who shoe being deal
why children and spoken for
lattice quick reply recap module it end-to-end dialog systems
traditional modularised dialogue system at the pipeline of natural language understanding dialog state tracking knowledge
base squarey
that a dialogue policy engine and natural language generation
and you and that of system connect all these motors together and the chain them
together with detecting and text out
the advantage of and you and fashion you that it can reduce the error propagation
dialog state tracking the key module which understanding user intentions
track dialog history and update dialog state at every turn
the update of dialogue state get used for carrying the knowledge base and a for
policy engine and for response generations
there are two popular approaches week or them fully structured approach and a freeform approach
the following doctrine approach uses the full structure of the knowledge base
both it's schema
and that the values
it as you don't that
the set of informable slot values and the requestable slots are fixed
the network about it's multiclass classification
the advantages that value and the slot are well aligned
the disadvantage in that it can not adapted dynamic knowledge base and detect out-ofvocabulary values
appeared user's utterance
the freefall approach does not exploit and information
a pause the knowledge base
in the model architecture
it achieves the dialog state as a sequence of informal values and the requestable slots
for example in the picture
in the restaurant domain
that dialog state it's
italian then we call an cheap then we call them
address then we call an and a full
the network it's sequences sequence
the pros i that
it can adapt to new domains
and that the changes in the content of knowledge base
it is stopped out-of-vocabulary problems
the disadvantage is that
value and the slot
and not aligned
for example
in travel booking system
given a
dialog state chicago and that's the other
can you tell
what you that departure city and the which when it's a rival city
and also
tough free from approach which model unwanted order of requestable slots and it can produce
in many states
that may be generated and non requestable slot words
so our proposed yet
flexible structured dialogue models
the contents fine components
the first it the queen hard
the queen hardly at all we encoded in core encoder module
and the yellow and orange part of our dialog state tracking
the purple part of its knowledge base query
the red part it's all a new module we propose yet call the response lot
decoders
and the green and of the we and that the blue part well together would
be the response generation
so we propose a flexible subject turn dialog state tracking
approach
what you use only the information in the schema
of the knowledge base but not to use the information about the values
the architecture we propose contains two parts
informable slot value decoder the yellow in this pictures
and the requestable slot decoder the already part
the informable slot value decoder has separate decoder to each informable slot
for example in this picture
what is for that right
given the start of standard token foot
the decoder generate italian and of food
for the requestable slot decoder idiot a multi-label classifier for requestable slots
or you can think that
binary classification given a requestable slot
you can see that inflexible structured approach has a lot of advantage first slot and
the values are aligned
it also solves all the vocabulary problem
and the k usually at that between your domains and of the changes of the
content of knowledge base because we are using a generation method for the informable value
decoder
and also we remove the unwanted order of the requestable slots and that the channel
to generate invalid the states
a nice the flexible subject read dialog state tracking it's
it can explicitly
a design value to slots
like the fully structured approach
why are also preserving the capability of dealing with all the vocabulary
like the freefall approach
meanwhile it ring challenges in response generation
the first challenge is that
the it possible to improve the response generation quality based i'll flexible structured dst
the second challenge is that
how to incorporate the output for a flexible subject or dst
for response generation
so regarding the first challenge
how to improve the response generation we propose a novel module called the response large
decoder
the writing to pick the right part in the pictures
the response slots
decoders
of the response slots
i don't slot names or the slot tokens
that appear in that you lexicalised the response
for example
the user request the address
the system replies
the address
often am slot
it in i just thought
so for the response lot colder we also adopt a multi-label classifier
regarding the stacking the challenge
how to incorporate
flexible subject or
the st
for the rest both generations
we propose toward a copy distributions
it will increase the chance of awards
in the informable slot values
requestable slots and the response lot to appear in the agent response
for example
the address of an m slot get e
i had just a lot so we are trying to increase the channels off
address
name slot and at a slot to appear in the response
it'll from now i'm going to go to detail how we link these modules together
first it always input encoders
i like input encoder
takes so you kind of input
the first get agents right well in the pastor
the second it that dialog state
and this sort yet the current the user's utterance
the out the were p
the last hidden state of the encoder
it was first asked initial hidden state
what the dialog state tracker and that the response generation
informable slot about a decoder gets one part of our flexible structure dst
it has to kind of input
the input e at last the hidden states from the encoders
and that the unique start of sentence syllables for each slot
for example
for the slot starting word gets food
the output
for each slot
a sequence of words regarding the slot values are generated
for example
the value generated of all for the slot here
italian
and awful
the intuition here is that
the unique start of sentencing both issuers
the slot and the value alignment
and that the complement can it then a command sequences sequence allows copying of values
directly from the encoder input
the requestable slot binary classifier
this is the another part in our d
flexible structure to dst
the you what is that
last hidden state of the encoder
unique start of send the symbols for each slot
for example
for the slot starting a war it also for
the also forty it's
for each slot
a binary prediction
true or false
the produced regarding whether the slot it is requested by the user or not
note that
but you are you here i guess only one step
it may be replaced that with any classification high key picture you want like
which uses you are good because we want to use the hidden state here
at the initial state for our response slot binary classifier
what the knowledge base acquire a get takes the in the generated informable slot values
and of the knowledge base and output
well how the vector represents the number of record the matched
he i get our response slot binary classifier
if the input es
the knowledge base par with a result
the hidden state from the requestable slot binary classifier
output yet
for each response plot a binary prediction
true or false
if the produced regarding whether it is response not appear in the asian the response
or not
the motivation is that
incorporating all it really relevant information about the retrieved entities
and that the requested slots into the response
our
copy what a word a copy distribution can use them
the motivation here is that
the canonical copy
mechanic then only takes a sequence of words in text input
but not accept
the multi porno distribution we obtain
from the binary classifiers
so we taking
the prediction from the informable slot the value decoders
and that from the requestable slot binary classifier and the response slot binary classifier
and output a word distribution
so
if a word yet a requestable slot or a response not
the probability of the a binary classifier output
if a word appears in the generated informable slot values
if the probability equal to one
four or other words in there
a interest about decoder
what taking that encode
the last hidden state of the encoders
and the knowledge base carried a result
and that the word a copy distributions
all support get a delexicalised agent response
the overall loss for the whole network what including the informable slot values
so loss and of the requestable slot values last response slot values most and that
the agent a response slot values but a gender is the boss loss
experimental settings
we use to kind of the that
the cambridge restaurant dataset and the stand for in-car assistant there is that
and the evaluation matches we use
for the dialog state tracking we report the
we report the precision recall and f-score four informable slot values and requestable smarts
and of what have completion
we use the and you match rate and the success f one score
and the blue yet apply to degenerated agent response for evaluating the language quality
we compare our method to these baselines
and em
and id and their functional ones
they using the fully structured approach what
for the dialog state tracking is
and the kb are in from the stand for
they do not think that they do not do that dialog state tracking
and that est p
and the t st p
without are your and ts tp the other freefall approaches
they use a two-stage copy and could be mccain didn't sequence of sequence
which kaldi software encoders and the true copy mechanic simple commanded decoders
to decode belief state first and then the response generation as
and of for the for its ep and also tuning
the response slot by the reinforcement learning
here the turn dialogue dialog state tracking results
you are notice that
our proposed the method fst in it performs much better than the free for approach
jesse p especially
the
especially on the requestable slot the reason is that
the free for approach that modeled the unwanted order
of the requestable slots
so that why hall or of f is the uncanny can perform better than them
this it our that of the level task completion without
you also notice that fst and can perform better than most
better than the baseline in models to match
you most the metrics that the blue on the kb it dataset
here it example of generated dialog state and the response from the free for approach
and all approach
in the calendars
okay
the belief state at the want to choose a belief that here is that for
the informable slot the you've and easy crow to the meeting and for the requestable
slot the user try to be acquired state
time and parity
the freefall approach it would generate meeting data and a party an ofdm would generate
the you've and the crow to them at a meeting data it to time it
to an a party it's true
you a notice that here the free for approach cannot generate the time
the time here the really that in the training dataset
the down a lot of example
contain data in the parties so they modeled disc the free one approach you model
it is kind of orders
so the mammoth right data in party together so when during the testing
the it
what during the testing if the user request that date time party it cannot predict
that the it cannot predict about the problem
and also for that
begin the response
the
one shows the it's your anatomy of the way it's
parties slot on that there is not a time slot the t a cp generate
the next meeting at that time slot on days not and the time slot and
i'll have sdm can generate
and maybe
a baseline at a time slot with part is not here the freedom approach can
generate system with the us and repeating this at the time slot
the conclusion here that we propose an island to an architecture with a flexible structure
model
for the task oriented dialogues
and the experiment
suggest that the architecture get competitive with these us assume top models and the wire
our model can be apply applicable you real world scenarios
our code will be available in the next few weeks on this links
and is it another when you work regarding the model be multi action policy what
task oriented dialogs it will appear mlp tucson it
the pre and the code are publicly accessible on this link all you can see
again the cure a cold
the traditional policy engine predicts what action per term which were limited express upon work
and introduce unwanted terms but interactions
so
we propose to generate monte action per turn by generating a sequence of tuple the
tuple units continue act and the smart
the continue here means well that we are going to stop generating just tuples all
we are going to continue to generate the couples the slot to me the accuracy
of the dialogue act and the slots media a does not carry it's the it's
not like a movie name
we propose a novel recurrent zero
called the data continues that's not g c is
which contains two units
continue you need act you need and the smallest unit
and it sequentially-connected in this recurrent is there
so the whole decoder yet in a recurrent of recurrent a fashions
we would like to deliver a special thanks to alex janice woman maps and the
stick their reviewers thank you
thank you very much for the talk
so are there any questions okay or in the back
i thank you very much that was very interesting
so what the system do if somebody didn't respond with a slot name or a
slot value
you know what time you what restaurant you want you that it is that the
closest one to the form theatre
excuse me to repeat the lessons again
your system prompts somebody for a restaurant where they want to eat you money that
some italian food the system says what restaurant would you like to eat at and
the user says the closest italian restaurant to the form theatre
so i'm not giving you a slot value i'm giving you a constraint on the
slot value
what this kind of an architecture do with something like that is a response okay
thank you a generate a
that
does not the menus provided user to what we are working for most of the
values were detected
so when we gent
the always informable slot value decoder
informable slot the melody currently decoder were trying to catch these the end use these
informations from the user side so when we are trying to generated is kind of
each things we are also well using the copy
we also are trying to increase these words to be appeared in the response generation
is for example
the titanium at the italian restaurant or you want to what b
a someplace this method that
i understand how you do that but the question is how would you get the
act to what the wrapper internal representation be somehow that we get the closest to
get the superlative
in the result how what if compute the closest of all you're doing is attending
to values that you have to compute some function like instance
actually at the very but the question i think that
it is them i'm getting you are trying to ask you whether if the
have to informable slot the values from the user the is not exactly match is
something that appear in the knowledge base
it is that strike not trying to i'm saying the user doesn't know what's in
the knowledge base it's just saying whatever is the closest one you tell me
okay the closest the one for example you get it will also be something like
it will be something like for the area slot values actually this kind of situation
our current a model cannot handle and or on the past work cannot handle because
it and it is not actually appeared in the dataset we are using
right thank you
any other questions
okay in that case i'd like to the collection
i notice that you were evaluating your model on two datasets the cambridge restaurant and
the key v read and i was wondering with wouldn't be or how difficult would
it be to extend them all to work on the multi walls dataset which is
you know bigger than those two and as more domains and
actually the very good questions
actually in the
in the for the for the most you want us that being the latest ecr
conference to trader network that is trying to do it
then updated it into do that they were showed that of the cherokee use the
system in a kind of all a similar kind of techniques
using different as that of sentence si models
two different start of than the steamboat to generated are the values
so i think that so we did a the our work kind of kind of
prove that
that's flexible started at the phonetic symbols structured the entity can be applied on the
multi award part
and the for the response generation part we believe that of the we believe that
our proposed the copy word like anything can also work
okay so basically you think that just retraining should it's should be sufficient i think
okay thanks okay any other question
it then i guess i have one more
and there was
basically when you when you were showing the us a lot response model or responsible
decoder that was the
i mean
and i and you said that you have like once the gru
ones that what as it exactly mean or weight is there like and one gru
cell that is
yes with a good
kind of using the gru zero but we do not using it the recurrent a
later
right and the output is like a
one hearts encoding of the slots to be inserted in the response or is it
some kind of embedding
here it's a it depending on but also put his to the whole body it
sure for small we can thing yet
distribution from that there'll where right okay so that's why or what a couple or
what a copy distribution what using this kind of zero to one values and the
probability that we decide whether this
the to increase this words channels appear in the in the agent response
right okay thank you very much thank you
alright so what's thank the speaker again