so i'm sure apparent carnegie mellon this is a collaboration work with nail i in
turn
and my abide alan black and alex rudnicky over there
today i'm gonna talk about strategy and policy learning rate nontask-oriented conversational system
so as we are now that non-task arrogant conversation systems allow people color the chat
bots or social chat
so the task is empower we say social chatting and then always people ask me
why do we need social chatting
so the motivation is simple actually
so if we see that human conversations we actually use a lot of social chatting
in our conversations when you're meeting someone very certain task you actually try to do
some social chatting to use that presenting the conversation it talk about your weekends before
you got into a meeting a genders
yes come social chatting is there a certain type of conversations most abuses social tie
with your coworkers was your friend of course it has all their application feels like
education
you want eager to turn to be social intelligent to be able to use these
are kind of clusters are chatting to interleave the conversations
i think health care
in language learning we say that in a complex task data used in these areas
social chatting that essential
so there are we wanna designing a system that is able to perform social chatting
and so we say we have some of the closing in mine one is just
tend to be appropriate
well the system to be able to go into dumps with the conversation
what the system to provide a variety of answers to suited when users
there
well we wanna say the main goal is to make sure the system is coherent
apart re in a signal turned and turn level
so we just applying this of happiness that occur in the response coherence with the
user utterance so we have three labels around an interpretable inappropriate or
so later we're gonna use these labels to you about a girl systems
their first-order we need a lot of data i don't to evaluate the system in
the same time we also wanted to have are fairly easy pipeline to actually do
the evaluation
people have been working on the art systems a know that it's hard to get
data
are you one
kristen is one single they don't like
and user evaluation you have to have a user to interact with the system it's
also very expensive
so here we in order to expedite the process we average about that are taxed
api so people can access the channel on web browser
we can have multiple people to talk to that at same time it's multi-threaded
and so we also automatically connect to the user to a rating task harder the
conversation that they can rate whether certain response is a problem not we give them
a whole dialogue history to review
so i'll we make it open sort of both the data and the co
so you can get a form i get
so we also have like demos that around on a on amazon mechanical turk some
more machine which re sorry the rounds
twenty four hours seven days a week and so if we go over so we
just gonna d a little bit so here is here
years screen then you type in something for example the job losses
i like me to the egg harbour we talk about music
there sure
what do you want
what do you want to talk about
there was a almost everything and you also the interaction is very easy it's a
very nice way to motivate the user to interact with the system
and it is also very easy way to evaluate data so we sometimes posted a
mechanical turk or social networks to actually get more user
there
let's take a step back to you look at the previous works about task oriented
system
so we usually are familiar with this architecture once we get the user input
we do language understanding that we going to a dialog manager used decide what to
generate and the end we have system output
so a lot of work have been doing that if there is some not understanding
happening in the system so some something that users that is not
comprehensible for the system
now a lot of people have designed conversational strategies to handle these are is for
example we sing can you say that again or dummy we are very familiar with
copies conversational strategies
it can however
there are a lot of work and
allowing you numbers are can be agenda cmu have been dealing was
you think on the p a tuple or the mpe to optimize the process of
choosing which strategy to use that which plane globally to optimize the
task completion rate so
and this in the previous work on task oriented system can we you do that
on down task current system
so the research questions
as can we d and can we develop conversational strategies to handle
for example we really care about the proper in it and can we and of
this you know probability nontask-oriented system
and can we actually use this kind of globally plan policy to actually regulate the
conversation for instance which i think
re
you
i apologise for their pipeline
question
already apologised for their
disturbance
so we try to train trying to say that can we use conversation and design
conversation strategy and conversation policies
to help the non-task utterances tend to be more appropriate
zero and here we design of a architecture which is very similar to a task
or an system
so here we phrase first about once we get the user input then we try
to use some context tracking strategies that we develop
and then we're going to say that we generate a response
and then if their responses and the system think there were i had
the system has a high confidence that the response is a good one
then we just
produce the system response back to the user
if there is a system is not confident that's a good response
and we got into you find some block and some semantic dialogue lexical-semantic strategy that
we introduce lately to deal with the low confidence if that if that works we're
just use that those methods to generate output if that
none of the conditions trackers in these strategies and we go into or engagement of
happiness strategies to actually a pretty and generate with five
there are in yesterday's prosody or we also talked about you know another system which
is similar to this one we also take a lot and engagement in to the
consideration of the whole top
process
so are we talk about
then we have three sets of strategies that we're gonna talk later in details about
how can we make the system more appropriate now how
also policy that actually and
actually choose between different strategies to make the whole process in a battery
in
to optimize the whole process globally
there we say that we have two components we're gonna talk about the response generation
side and the conversational strategy selection right
the rest of a how do we track context so we
we have first about anaphora resolution which is like we prove that we bring mainly
that problem resolution
we because we wanted to make a strategy that start ninety percent of the case
and so for example like to you like taylor swept
which attack the tailor swept
and
it's a yes i like are a lot and we replace her with a list
but here for the next response generation
we also do response ranking with a history similarity
basically we use word to back to rank the similarity between the candidates and the
previous word really utterance
for example take taxes i watch a lot of
baseball game a whole
and the units there what you like most
so here that we have two candidates
so why is that like tell us what's
the others are like were he bounced up so here we did if we do
the word two vectors similarity past we will narrow down
the second one is preferred because they are more on the same hazing system in
semantic
then we go into your response generation methods
so after we ugh consider the context and history inside and then we do their
actual generation so we have two methods that we actually is
and select based on the confidence
one keyword which we what we're triple matrix
basically we of
we find the keywords in the data i'll find the user the keyword thing the
user's response and a match that in the database
no we're turn the corresponding response that has the highest weight
aggregated weight
there we use the data that would you existing interview transcript statist antenna
we also collect their personal data standard using mturk
the other after the there are skipped on your network
model
so basically it we are using encoder and decoder to decode to generate the response
we all concept i don't is on sixteen in this matter
basically a we have two
a message i we select the most of the wonder with the highest confidence
here
if the confidence that high in the response generation model we just switch and the
response back to the user
if it is low
what we gonna to you as
right
re
apologise for the expected being the
or point following when greatly
like
i know how well
right
maybe you are okay
okay
so here we say that we go over some lexical-semantic strategy if the confidence generation
score is low
then finally were talk about other one
there
there we designed a row or strategies for example if the user repeats and twelve
we're say you already is that
and if the user is very it's replying with single where we're just react to
that saying like you're do say something incomplete sentence
our us to have grounding and technology a routing strategies
a named entity
so basically we detect the name entity and try to find that in the database
and knowledge base and try to your use a template to fix
so for example do you like clinton which content i'm talking about bill clinton the
for the do you know state
or kilogram and
the democratic can
so we also have run to out-of-vocabulary so for example we detect there are other
work average then you template to generate the sentence and the same time we update
the wer recovery as well
so for example you to say
your very confrontational take into excel
what do you mean by confrontational
there we a lot of queries try to get iq value to how these strategies
are doing based document annotation about a proper in
we can see that mostly people think they are appropriate where there are some problems
for example if the named entity the wrong
then the
a generative responses were not be a correct
for example we also have like your other work have the words if the user
is asked to using some of more casual way of spelling is that you checked
are trying to confront with that and that you there is find a inappropriate
she intends to you has to existing already to trigger that come strategy so if
none of the conditions triggers we actually going to or engagement of province strategies should
to actively try to bring that you there and the conversation
zero you look into previous literature
basically we find that in communication cultures and literatures active participation it's really important
also like positive feedback or encouragement we mainly implement a set of strategy that
goes with the active participation strategy
and zero well whenever we start at a conversation we usually pick a topic to
in the shape that you to the user
and then we would design each strategies which with respect to the topic and so
we have to that you can stay on the topic or change the topic so
if we use try to stay on the topic we could tell jokes they do
you know that people usually spent for more time watching sport the actual playing any
initiate activity for example you want
game together sometime
and talk more let's talk about more about work
you can also change the top
for example like how about we talk about
and the topics with an open question that's interesting you sure with mu some interesting
news on the internet
so basically we also evaluated on the five minutes of these strategies based on the
you there's really so here we only use a randomly selection policy which means that
whenever we find the
and the generation was not a gesture generation how that's as well
and the not of their lexical-semantic strategies are triggered
we go over to these we randomly select one of these strategies for that
and we do find some of them are doing pretty good
for example like you're initiation telling more
so by some of them are actually doing pretty bad for example joe so maybe
five
there without the contact these strategies can go wrong very much
so here is one of the humble
apologise again
sure you make up to time they're here the paillier case we can see so
take out that a lot really like politics like talk paul and there's no i
don't like politics zero why that and the user i just don't like politics
and second here and then goes interior a strategy that but we
watch of them together sometime that i told you got all want to talk about
politics
basic we find there is the in more poppy nothing side of the
whenever if we struck and select the strategy with our with i'll taking the context
into consideration that will look into closely to the semantic context we find that user
r expressing negative sentiment in rural and at this time
the correct way is to
pick a strategy which is that's which topic
actually can
handle the situation when you there is happy about sure ideal watching your
so we say that we need to model the context into you their strategy selection
there
basically we have to use a of work we wanted to a voice it's improper
in this in a proper in it
then we using reinforcement learning to do the global planning so we take some of
state variables which are their uncertainty and which are some of their variable so we
mentioned before
for example system problem is competent
there is a previous utterance sentiment competent and number of each strategy executed and term
position most recently used strategy so we take all these into consideration in training our
marines were smelling policy
we use another chat about as assimilated to train the conversation and
conversation
so we have a reward function
which is the combination of response to prominent a conversational taps any information gain
there are the purpose we already defined it
and then we train their about binary classifier based on the human like not label
so this automatic predictor is gonna used in the reinforcement learning training process
and also the company we define conversational data sets the constructed for all utterances
and your role and that keeps on the same topic we also and are on
the other automatic predictor based on the human annotation
and finally we have the finer the other one which is the information gain which
accounts for the variety of the conversation
so we just like the number of unique where and the post that you very
and the system have spoken
so in the end we have way we
i am prickly decided to wait to you are trained and two for the reward
function which we think later we well we were gonna be using a machine learning
about six to train the way
zero
we have another to policy that we compare our reinforcement learning policy against with first
of the random selection policy
the other is a local greedy policy which is based on the previous three sentence
sentiment to decide a strategy
for example
i've the user is positive in a row we can say can talk more about
this topic
if it's an active with which are policy as which are topic
so in the end we define what we have training where we are using their
reinforcement learning train piloting and testing or not
with real human interacting with the system
we decrease the in a problem in it
we increase the computational adapted and there are totally information gain
they're the conclusion and we think the conversation and strategies design
a unit lexical-semantic strategies are you in a are useful
and considering and conversational history is useful
and integrating out also didn't user and different upstream ml models are in the reinforcement
learning is useful
any questions
okay
yes so that's a good question so we basically we do you have like a
different surface form in this kind of designing this
strategy
this is actually our future work we wanted to actually to see how can we
generate sentences was pragmatics inside of it
right now it's some is based on some templates
so basically we tried to use different were in different worrying about
it is still templates not really a very general
jury
and that's a good a question so here the idea we trying to say that
we trying to integrate as much as
their uncertainty of the conversation into the dialogue planning definitely of all these kind of
where two vector
is also an extra information can get into their own strategy selection
or a star for if you're spoken dialogue system asr a error is
so i think you definitely if you can optimize and considering all these uncertainties instead
of the dialogue system we would be better
but we haven't done that yet
you much states
basically it there
it's like expansion and the space will expanding exponentially of you had a more variables
and their
any other questions
o
and that's a good question so basically we ask the user very so we just
give the using with respect to user's utterance do thing
the response is appropriate coherent no not
so sometimes people think or if they're changing topic is kind of right on time
they think it's appropriate
if it's not
and they would think it saying appropriate
there is totally we give them pretty broad interpretation of how appropriate it is
so a lot of people do you pick context into consideration what they're waving them
true
true
pretty well pretty right so that's why we try to in the reward function we
try to and come for the variety as well
in the optimisation function zero basically
appropriateness is like a one aspect of making the system communicable
and the others make a being a file on there being provocative or anything else
could be add up on that
so i think it's like a different your inbox
and their variety or personalisation the something could be considered
i