i'm not like a
and my dog adviser a woman devilish and that he picked him
and i want to talk about the user adaptation
in dialogue system
so most of the state of course
dialogue system and most of the production dialogue system
are adapting
gender equality generic strategy
so we have the same behavior
for any user
and what's going to do is to learn one strategy
for each of these users
the propose a problem with a learning strategy from scratch
is one to do some expression
and expression lead to
very bad
performance is far directions
so we want to design
a framework
which is
i very good during the course starts of face
and it must also be good during the as i said
concept that interface
so we propose
for processes for user adaptation
and who can composed of upright faces
and it goes of this way
so let's say we have a bunch of robot's we present think a dialogue system
and each of these robots
a learning strategy versus use a specific users
and they also giver
or the dialogue was done with the this user
so all the knowledge of this well but
is represented
by the dialogues
so we want to elect
some representatives
all the database
and for example gives a little bit and i did one
and it's a it's a novel we have a target user
and we don't have a system
two dialogue you'd of these target user so we want to design a system from
and what's going to do is to transfer the knowledge of one of the we
present that you to the system
so i'd first we want to select the best representative to dialogue we have or
target user input
and we will try it should be represent the t one by one
and at the end
we select the better a dialogue system which is blue lines the you use
so now we natural for all the knowledge
to the new system
so let's say we have
scrunch system
and we're gonna know the strategic thanks to the knowledge transfer and also
we all the dialogue don't during the source selection face
so we gonna use this new this can they have system
to their with this user
and we collect more dialogues
and then we can learn new system morse a more specialised
to this target user
and we repeat this process and to be which
a very as busy writers the spectral is
general system to be a target user
so in the end we are then you
and you wanna target dust into the two sources
so i will detail each of these a face
so the sources are dialogue manager
so they have manager components of dialogue systems
and this manager take as input a repetition activities
for example i would like to book a flight suit on then
and the dialogue manager with the connection
for example a good field or a good nine
and the usual way to design their manager
is to a task than a reinforcement learning problems
so we first but only programs
and you with one engines
interaction with no agreement
so for example are agent is a dialogue manager
and the environment will be a target user
so the engine can take
and the environments we'll react
and we can also it's a reaction
so prime is an observation and we can also are but we are we want
so amp right
and even in this observation and no also the action taken
be an agent can a date
it's a joint state
so we got here we go to a far from is to a sprite
so we conducted that
or the knowledge of the environment is contain
in the top l is a
a sprite and
our prior
so this is
the mentioning you know reinforcement learning
so we have knowledge of the environment
taking the form of the samples
and we want to design a good the strategy for the nao manager
and have used that this is good policy so this is a function mapping
states to a collection
and we want to find the optimal policy
so the optimal policy
is a policy which maximizes
at the community we weren't
during in the direction
between the dialogue manager and the target user
so no
i of the there is an equivalency between the dialogue manager a time stamp
robots and a policy
so we want to find the best
what d c two represents all the database
so this is this will selection phase
and we introduce in this is the main contribution of the paper
we introduce bodysuit raven distance
so this is a matrix
which computes
the have you or differences between what is
we some state and we look at which edge action is taken
in a each of these distinct
and for example one can see that the third one
is very close to populate one
and the yellow is very different to the to the little
so one can see this at least relevant distance
as a binary vector
and where the ones
we present the action taken in a given state
so for example
we will but take these actions
and the been every vector will look like
and it if we combine of using every vector
to the gender and all
we have a unique button see
with the which is greater
train a distance
so this allow us to use a clustering algorithm called k-means
so can means will give our or the skewed or a dialogue manager
as clusters
and since we want to represent the gmm
we will have to learn one policy by clusters
so we give a working knowledge of each cluster and we learned policy with that
but we can also use an of our algorithm
code that come into its
and i'm in the winter thanks to the police drama distance
we finish directly free representative
okay so no we want to select the best
policy to dialogue with the target user
so this is association or
so for that we cannot use a bounded algorithm
corn use into one
so usually one will test
each of the representative one by one time
so you would deal with when one and two score is to with a one
and then the with one
and no is the next dialogue other the next system that the user will dialogue
is as a system which maximize the be value so
now we will deal with the blue one
and the u w is to the best
so we keep the earring with the blue one
and to which a very but school
and at these points
the red system at the better value so we switch or robots
and we would be this process and to me which are maximum timing it
for example one hundred the time step
and so we know that on this is as the system or maximizing the them
so the point of using a c d one is that the summaries and take
into account the high variability
of the dialogs
okay so knowledge transfer the knowledge of this you know to a menu system
so is also face
so let's saying we have to the edge of samples the source image and the
target image
and we want to remove
where the sample from the source badge
already played present in the target image
so for that we use those two base
so this is a filtering algorithm
it will consider their each some part of the source of h
so let's say we start with this one
and it would what's there are some kind with the same action
so these two
and sees us israel states is very different to the red state in the two
we can have a the source better
to the funeral image
no we because the obvious something
and we can see that the light red state is very close to the right
so we don't at this simple to the pitch
and we keep the we continue this for each sample of just a bench
and in the end that we have but target image
and we will use it really was this
for learning a new policy
so the other so that only
is don't thanks to we the did you
so if you did you is a reinforcement learning algorithm which take of any goods
a bunch of samples
and it would computes the optimal policy for this some pairs
to think issue is
and i resign coming from fitted value iteration and this specific algorithm can also from
body recognition
and value iteration is a very famous algorithm to solve a markov decision processes
so if we combine as a filtering in the running
one can see that we learn a
a system
which is a mix between when diesel together and the real users
so we're gonna use this new
this new system
to dialogue now
we target user
so we a new dialogue to the target bench
and you can see that the free software that at the bench are very similar
to the sampling this was image
so in the enter
it remains only is about as a as a sample from the target image
so when we going out on the then you put it
we will on the very special specialised system to this a target user
so this is the overall the additional process for
for users
and what we want to test are
our framework on some experience
so we gonna uses the negotiation that okay
so we focused on a negotiation because
we have two actors
having a different be have your
so we want to adapt to this year
so in the negotiation there again you want to appear
and they are given some time slots
and preferences
for each time slot
and averaged around a
each agent
we're the proposed a slot
for example kenny proposed a this drinks but
and the wheel but we shoes and propose it's one utterance but
so since as negation again is an obstruction of a yellow
dialogue we introduced a noise
in communication channel
and the form of switching sometimes but so for example we replace the previous times
right with the yellow one
and can you will result we will assign a new information
as a form of an automatic speech recognition score
and you want this information it can continue the dialogue
are you can ask to deal the origin to repeat the proposition
or you can and does the data
so for example you yes to repeat
and be able but repeats
and at some points
can you can accept the proposition
are you can also deny and the dialogue
in the end of the dialogue where the users are rewarding
we have a score
and this court is functions you'd
with the
we are all the time slot and read
so i four went to say that the point of the game
is to final than agreements
between at experts
so can you really ugly well the less buttons here the all but see so
that estimates is
is smaller
so now we want to test the this again
we use the and there is a under the user interacting with the system so
we designed a similar to users
with a very difference profiles
and so we have for example the determinized each user
we will you will
proposed is a certain slots in decreasing order
and we have also this one now proposing instance
taking a random actions
this wonderful whereas propose it's a base the best start
and this one accept as soon as possible and finally
this one and the dialogue as soon as possible so this is very different be
a if you are and we want to adapt to these vehicles
we also design you want models
so each one model is
is a model of you man thanks to everything off
one and read the dialogue by men so for you man
and we model it is these
is that so we used results
with a k-nearest neighbor algorithm
and you can scenes in the table
the distribution of action for a feature we really humans
so you can lead to that we'll and at x are very similar
and you go and no one are pretty difference
so now we want to design the system
which we don't directly with this these results
so that won't have the same action and the of the users to simplify the
as a set of function is received restricting
and we don't know as we so previously this system with a few
and a morse wire and that's one really agrees them to do some exploration
so the in this tell the isn't sure of the dialogue system the dialog manager
is a actually to commit a combination of the costs of the automatic speech sure
regression recognition score
and also the number of the
of that are during the key
so before test susie
men framework we want to show that running one system by a user is a
good thing
so here we have a bunch of system so v s u one two three
extra and each of the system learning strategy
with the this users so obviously when we don't know
the strategy against a pu one
and you can not is that the board values
actually indicate that
as a bit so the bit the system to dialogue we've a given user is
the system we should on the strategy
we this user
so there is a real we need to adaptation
we can share the same with you'll and when they're users
the t and the difference is that well if you
and actually it is the especially for is a screen and thus use alex
the both
one point or seventy four in one way or seventy three
a very close and you can do sources and the thing for the line we
no we can test the main framework for adaptation
so for that we introduce two new methods
one using
and without the scratch so is quite sure it's just go down just learn to
make the system from scratch without
transferring in english
and the other one is a limited so this is the generic
generic midi the
each way on the policy we all the knowledge of the database
so we generate too slow system database one for the user's stability and once for
the human model users
and each new system is it on things to
we one that thousand two hundred dialogues
and each means that there is this two
we to two hundred dialogues
so for simulated users
alternate alternative is intent on the other show a significant better result than i don't
know and scratch for the two metrics
the scores and task completion
but in an upper hand for your money they results
our method are it is better
but not that much and
the reason for that is negotiation that again is a two simple for humans
and i actually most of the human have the same behavior on the game
so there is no points of learning
i don't that you strategy
since all the people have the same behavior
so we have to conclude we provide the framework for a user adaptation
and the we introduce a prescription distance which is a way to
compute the everywhere differences
and we validate the framework on both
this unit user and human with a user setup
and finally we show that the overall
dialogue quality is a hands
based on two metrics of the task completion and the score
so thank you
i wasn't sure what you squirt for your cross comparison
i we want to see this way
next table so what is numbers and what's good
well which
each for represents the score
of each is then given the user of the whole
so the system is
and the other thing we the each user
so for example a dispute to have a score of zero point forty four
we the b one
what is that score
score is a score is
is the mean we while of is a diagonal
g i at the end of the dialogue there is a we want okay and
we do some you know g though
on the register maximum rate is the maximum score
yes actually it's
it's too
sorry the higher better that's
the question could you
more details about a reinforcement learning
i e c
the key
you want you are
speaker once again