0:00:15so i'm sure apparent carnegie mellon this is a collaboration work with nail i in
0:00:21turn
0:00:21and my abide alan black and alex rudnicky over there
0:00:25today i'm gonna talk about strategy and policy learning rate nontask-oriented conversational system
0:00:30so as we are now that non-task arrogant conversation systems allow people color the chat
0:00:36bots or social chat
0:00:38so the task is empower we say social chatting and then always people ask me
0:00:43why do we need social chatting
0:00:47so the motivation is simple actually
0:00:49so if we see that human conversations we actually use a lot of social chatting
0:00:54in our conversations when you're meeting someone very certain task you actually try to do
0:01:00some social chatting to use that presenting the conversation it talk about your weekends before
0:01:05you got into a meeting a genders
0:01:08yes come social chatting is there a certain type of conversations most abuses social tie
0:01:14with your coworkers was your friend of course it has all their application feels like
0:01:19education
0:01:20you want eager to turn to be social intelligent to be able to use these
0:01:24are kind of clusters are chatting to interleave the conversations
0:01:28i think health care
0:01:30in language learning we say that in a complex task data used in these areas
0:01:36social chatting that essential
0:01:39so there are we wanna designing a system that is able to perform social chatting
0:01:44and so we say we have some of the closing in mine one is just
0:01:48tend to be appropriate
0:01:50well the system to be able to go into dumps with the conversation
0:01:54what the system to provide a variety of answers to suited when users
0:02:00there
0:02:03well we wanna say the main goal is to make sure the system is coherent
0:02:06apart re in a signal turned and turn level
0:02:10so we just applying this of happiness that occur in the response coherence with the
0:02:14user utterance so we have three labels around an interpretable inappropriate or
0:02:21so later we're gonna use these labels to you about a girl systems
0:02:25their first-order we need a lot of data i don't to evaluate the system in
0:02:30the same time we also wanted to have are fairly easy pipeline to actually do
0:02:35the evaluation
0:02:36people have been working on the art systems a know that it's hard to get
0:02:40data
0:02:40are you one
0:02:42kristen is one single they don't like
0:02:44and user evaluation you have to have a user to interact with the system it's
0:02:49also very expensive
0:02:51so here we in order to expedite the process we average about that are taxed
0:02:56api so people can access the channel on web browser
0:03:02we can have multiple people to talk to that at same time it's multi-threaded
0:03:06and so we also automatically connect to the user to a rating task harder the
0:03:12conversation that they can rate whether certain response is a problem not we give them
0:03:17a whole dialogue history to review
0:03:20so i'll we make it open sort of both the data and the co
0:03:25so you can get a form i get
0:03:27so we also have like demos that around on a on amazon mechanical turk some
0:03:32more machine which re sorry the rounds
0:03:35twenty four hours seven days a week and so if we go over so we
0:03:40just gonna d a little bit so here is here
0:03:44years screen then you type in something for example the job losses
0:03:48i like me to the egg harbour we talk about music
0:03:50there sure
0:03:52what do you want
0:03:54what do you want to talk about
0:03:59there was a almost everything and you also the interaction is very easy it's a
0:04:04very nice way to motivate the user to interact with the system
0:04:10and it is also very easy way to evaluate data so we sometimes posted a
0:04:14mechanical turk or social networks to actually get more user
0:04:25there
0:04:26let's take a step back to you look at the previous works about task oriented
0:04:31system
0:04:31so we usually are familiar with this architecture once we get the user input
0:04:36we do language understanding that we going to a dialog manager used decide what to
0:04:41generate and the end we have system output
0:04:45so a lot of work have been doing that if there is some not understanding
0:04:48happening in the system so some something that users that is not
0:04:53comprehensible for the system
0:04:55now a lot of people have designed conversational strategies to handle these are is for
0:05:00example we sing can you say that again or dummy we are very familiar with
0:05:04copies conversational strategies
0:05:07it can however
0:05:09there are a lot of work and
0:05:11allowing you numbers are can be agenda cmu have been dealing was
0:05:16you think on the p a tuple or the mpe to optimize the process of
0:05:21choosing which strategy to use that which plane globally to optimize the
0:05:26task completion rate so
0:05:29and this in the previous work on task oriented system can we you do that
0:05:32on down task current system
0:05:35so the research questions
0:05:36as can we d and can we develop conversational strategies to handle
0:05:41for example we really care about the proper in it and can we and of
0:05:45this you know probability nontask-oriented system
0:05:48and can we actually use this kind of globally plan policy to actually regulate the
0:05:53conversation for instance which i think
0:06:07re
0:06:09you
0:06:33i apologise for their pipeline
0:06:35question
0:07:02already apologised for their
0:07:03disturbance
0:07:05so we try to train trying to say that can we use conversation and design
0:07:09conversation strategy and conversation policies
0:07:12to help the non-task utterances tend to be more appropriate
0:07:16zero and here we design of a architecture which is very similar to a task
0:07:20or an system
0:07:21so here we phrase first about once we get the user input then we try
0:07:26to use some context tracking strategies that we develop
0:07:29and then we're going to say that we generate a response
0:07:32and then if their responses and the system think there were i had
0:07:37the system has a high confidence that the response is a good one
0:07:40then we just
0:07:42produce the system response back to the user
0:07:45if there is a system is not confident that's a good response
0:07:49and we got into you find some block and some semantic dialogue lexical-semantic strategy that
0:07:54we introduce lately to deal with the low confidence if that if that works we're
0:08:00just use that those methods to generate output if that
0:08:05none of the conditions trackers in these strategies and we go into or engagement of
0:08:09happiness strategies to actually a pretty and generate with five
0:08:15there are in yesterday's prosody or we also talked about you know another system which
0:08:20is similar to this one we also take a lot and engagement in to the
0:08:24consideration of the whole top
0:08:26process
0:08:27so are we talk about
0:08:29then we have three sets of strategies that we're gonna talk later in details about
0:08:33how can we make the system more appropriate now how
0:08:37also policy that actually and
0:08:39actually choose between different strategies to make the whole process in a battery
0:08:46in
0:08:47to optimize the whole process globally
0:08:50there we say that we have two components we're gonna talk about the response generation
0:08:55side and the conversational strategy selection right
0:08:58the rest of a how do we track context so we
0:09:02we have first about anaphora resolution which is like we prove that we bring mainly
0:09:07that problem resolution
0:09:09we because we wanted to make a strategy that start ninety percent of the case
0:09:15and so for example like to you like taylor swept
0:09:17which attack the tailor swept
0:09:19and
0:09:20it's a yes i like are a lot and we replace her with a list
0:09:25but here for the next response generation
0:09:29we also do response ranking with a history similarity
0:09:32basically we use word to back to rank the similarity between the candidates and the
0:09:37previous word really utterance
0:09:41for example take taxes i watch a lot of
0:09:44baseball game a whole
0:09:46and the units there what you like most
0:09:48so here that we have two candidates
0:09:51so why is that like tell us what's
0:09:52the others are like were he bounced up so here we did if we do
0:09:56the word two vectors similarity past we will narrow down
0:10:00the second one is preferred because they are more on the same hazing system in
0:10:05semantic
0:10:06then we go into your response generation methods
0:10:09so after we ugh consider the context and history inside and then we do their
0:10:14actual generation so we have two methods that we actually is
0:10:18and select based on the confidence
0:10:20one keyword which we what we're triple matrix
0:10:23basically we of
0:10:25we find the keywords in the data i'll find the user the keyword thing the
0:10:29user's response and a match that in the database
0:10:32no we're turn the corresponding response that has the highest weight
0:10:36aggregated weight
0:10:39there we use the data that would you existing interview transcript statist antenna
0:10:43we also collect their personal data standard using mturk
0:10:48the other after the there are skipped on your network
0:10:52model
0:10:52so basically it we are using encoder and decoder to decode to generate the response
0:10:59we all concept i don't is on sixteen in this matter
0:11:02basically a we have two
0:11:03a message i we select the most of the wonder with the highest confidence
0:11:11here
0:11:12if the confidence that high in the response generation model we just switch and the
0:11:16response back to the user
0:11:18if it is low
0:11:19what we gonna to you as
0:11:22right
0:11:45re
0:11:53apologise for the expected being the
0:11:55or point following when greatly
0:12:05like
0:12:06i know how well
0:12:18right
0:12:39maybe you are okay
0:12:41okay
0:13:05so here we say that we go over some lexical-semantic strategy if the confidence generation
0:13:10score is low
0:13:12then finally were talk about other one
0:13:14there
0:13:22there we designed a row or strategies for example if the user repeats and twelve
0:13:27we're say you already is that
0:13:29and if the user is very it's replying with single where we're just react to
0:13:35that saying like you're do say something incomplete sentence
0:13:38our us to have grounding and technology a routing strategies
0:13:42a named entity
0:13:44so basically we detect the name entity and try to find that in the database
0:13:48and knowledge base and try to your use a template to fix
0:13:52so for example do you like clinton which content i'm talking about bill clinton the
0:13:57for the do you know state
0:13:58or kilogram and
0:14:00the democratic can
0:14:02so we also have run to out-of-vocabulary so for example we detect there are other
0:14:07work average then you template to generate the sentence and the same time we update
0:14:11the wer recovery as well
0:14:13so for example you to say
0:14:15your very confrontational take into excel
0:14:17what do you mean by confrontational
0:14:20there we a lot of queries try to get iq value to how these strategies
0:14:24are doing based document annotation about a proper in
0:14:29we can see that mostly people think they are appropriate where there are some problems
0:14:33for example if the named entity the wrong
0:14:36then the
0:14:37a generative responses were not be a correct
0:14:40for example we also have like your other work have the words if the user
0:14:44is asked to using some of more casual way of spelling is that you checked
0:14:50are trying to confront with that and that you there is find a inappropriate
0:15:02she intends to you has to existing already to trigger that come strategy so if
0:15:07none of the conditions triggers we actually going to or engagement of province strategies should
0:15:12to actively try to bring that you there and the conversation
0:15:16zero you look into previous literature
0:15:19basically we find that in communication cultures and literatures active participation it's really important
0:15:28also like positive feedback or encouragement we mainly implement a set of strategy that
0:15:34goes with the active participation strategy
0:15:37and zero well whenever we start at a conversation we usually pick a topic to
0:15:42in the shape that you to the user
0:15:44and then we would design each strategies which with respect to the topic and so
0:15:50we have to that you can stay on the topic or change the topic so
0:15:53if we use try to stay on the topic we could tell jokes they do
0:15:57you know that people usually spent for more time watching sport the actual playing any
0:16:03initiate activity for example you want
0:16:05game together sometime
0:16:07and talk more let's talk about more about work
0:16:11you can also change the top
0:16:12for example like how about we talk about
0:16:15and the topics with an open question that's interesting you sure with mu some interesting
0:16:20news on the internet
0:16:22so basically we also evaluated on the five minutes of these strategies based on the
0:16:28you there's really so here we only use a randomly selection policy which means that
0:16:32whenever we find the
0:16:34and the generation was not a gesture generation how that's as well
0:16:38and the not of their lexical-semantic strategies are triggered
0:16:41we go over to these we randomly select one of these strategies for that
0:16:47and we do find some of them are doing pretty good
0:16:50for example like you're initiation telling more
0:16:54so by some of them are actually doing pretty bad for example joe so maybe
0:16:58five
0:16:59there without the contact these strategies can go wrong very much
0:17:03so here is one of the humble
0:19:15apologise again
0:19:17sure you make up to time they're here the paillier case we can see so
0:19:24take out that a lot really like politics like talk paul and there's no i
0:19:29don't like politics zero why that and the user i just don't like politics
0:19:33and second here and then goes interior a strategy that but we
0:19:38watch of them together sometime that i told you got all want to talk about
0:19:42politics
0:19:43basic we find there is the in more poppy nothing side of the
0:19:47whenever if we struck and select the strategy with our with i'll taking the context
0:19:52into consideration that will look into closely to the semantic context we find that user
0:19:57r expressing negative sentiment in rural and at this time
0:20:02the correct way is to
0:20:03pick a strategy which is that's which topic
0:20:06actually can
0:20:08handle the situation when you there is happy about sure ideal watching your
0:20:13so we say that we need to model the context into you their strategy selection
0:20:19there
0:20:20basically we have to use a of work we wanted to a voice it's improper
0:20:23in this in a proper in it
0:20:25then we using reinforcement learning to do the global planning so we take some of
0:20:28state variables which are their uncertainty and which are some of their variable so we
0:20:34mentioned before
0:20:35for example system problem is competent
0:20:38there is a previous utterance sentiment competent and number of each strategy executed and term
0:20:45position most recently used strategy so we take all these into consideration in training our
0:20:51marines were smelling policy
0:20:54we use another chat about as assimilated to train the conversation and
0:20:59conversation
0:21:00so we have a reward function
0:21:02which is the combination of response to prominent a conversational taps any information gain
0:21:09there are the purpose we already defined it
0:21:11and then we train their about binary classifier based on the human like not label
0:21:17so this automatic predictor is gonna used in the reinforcement learning training process
0:21:23and also the company we define conversational data sets the constructed for all utterances
0:21:28and your role and that keeps on the same topic we also and are on
0:21:34the other automatic predictor based on the human annotation
0:21:39and finally we have the finer the other one which is the information gain which
0:21:43accounts for the variety of the conversation
0:21:45so we just like the number of unique where and the post that you very
0:21:49and the system have spoken
0:21:52so in the end we have way we
0:21:54i am prickly decided to wait to you are trained and two for the reward
0:21:58function which we think later we well we were gonna be using a machine learning
0:22:03about six to train the way
0:22:05zero
0:22:06we have another to policy that we compare our reinforcement learning policy against with first
0:22:10of the random selection policy
0:22:12the other is a local greedy policy which is based on the previous three sentence
0:22:17sentiment to decide a strategy
0:22:19for example
0:22:19i've the user is positive in a row we can say can talk more about
0:22:23this topic
0:22:24if it's an active with which are policy as which are topic
0:22:28so in the end we define what we have training where we are using their
0:22:31reinforcement learning train piloting and testing or not
0:22:35with real human interacting with the system
0:22:37we decrease the in a problem in it
0:22:40we increase the computational adapted and there are totally information gain
0:22:45they're the conclusion and we think the conversation and strategies design
0:22:50a unit lexical-semantic strategies are you in a are useful
0:22:54and considering and conversational history is useful
0:22:57and integrating out also didn't user and different upstream ml models are in the reinforcement
0:23:02learning is useful
0:23:06any questions
0:23:08okay
0:23:31yes so that's a good question so we basically we do you have like a
0:23:37different surface form in this kind of designing this
0:23:40strategy
0:23:41this is actually our future work we wanted to actually to see how can we
0:23:46generate sentences was pragmatics inside of it
0:23:48right now it's some is based on some templates
0:23:52so basically we tried to use different were in different worrying about
0:23:56it is still templates not really a very general
0:24:02jury
0:24:18and that's a good a question so here the idea we trying to say that
0:24:22we trying to integrate as much as
0:24:25their uncertainty of the conversation into the dialogue planning definitely of all these kind of
0:24:30where two vector
0:24:32is also an extra information can get into their own strategy selection
0:24:36or a star for if you're spoken dialogue system asr a error is
0:24:41so i think you definitely if you can optimize and considering all these uncertainties instead
0:24:46of the dialogue system we would be better
0:24:48but we haven't done that yet
0:24:52you much states
0:24:55basically it there
0:24:56it's like expansion and the space will expanding exponentially of you had a more variables
0:25:03and their
0:25:08any other questions
0:25:30o
0:25:30and that's a good question so basically we ask the user very so we just
0:25:35give the using with respect to user's utterance do thing
0:25:40the response is appropriate coherent no not
0:25:43so sometimes people think or if they're changing topic is kind of right on time
0:25:48they think it's appropriate
0:25:50if it's not
0:25:51and they would think it saying appropriate
0:25:54there is totally we give them pretty broad interpretation of how appropriate it is
0:25:59so a lot of people do you pick context into consideration what they're waving them
0:26:08true
0:26:09true
0:26:25pretty well pretty right so that's why we try to in the reward function we
0:26:30try to and come for the variety as well
0:26:34in the optimisation function zero basically
0:26:38appropriateness is like a one aspect of making the system communicable
0:26:44and the others make a being a file on there being provocative or anything else
0:26:48could be add up on that
0:26:50so i think it's like a different your inbox
0:26:52and their variety or personalisation the something could be considered
0:27:05i