0:00:15or a graph to everybody
0:00:18implement
0:00:19animal but student at cmu with justine
0:00:22and i'm going to describe our work on
0:00:24automatic recognition of
0:00:26social conversational strategies
0:00:29which contribute to building maintaining a sometimes destroying of lighting relationship us to specifically these
0:00:34conversation strategies are reported things like self disclosure shared experiences a prisons go on
0:00:43let's begin with the motivation of the talk
0:00:45a speaker of course you multiple conversation goes in a dialogue and contributions low conversation
0:00:51can often be divided into a
0:00:53the like those that one for robust functions those that will fill interaction functions like
0:00:58turn taking and those that fulfill a in that wasn't functions
0:01:02which manage the relationship between the interlocutor's over time
0:01:07in the category of all that fulfil these in that wasn't functions are a conversation
0:01:12strategies
0:01:13which a particular we do nothing
0:01:15and i don't have an impact on
0:01:18the relationship between the two individuals
0:01:21so in this well we propose a technique to model and automatically recognize these conversational
0:01:25strategies
0:01:27from like using multimodal information the we use
0:01:30well the visual and the vocal modalities of the speaker as well as the interlocutor
0:01:34in the current and the previous done
0:01:37and we believe that it's important
0:01:39i as more natural conversations with dialogue systems become part of you closed at like
0:01:44to believe that the martin for on advancing the capability of the dialogue systems not
0:01:49only do
0:01:50they convey information energy was moved interaction
0:01:53but also manage long-term interactions by building intimacy and rapport
0:01:57not just for the sake of companionship
0:02:00but at the more intrinsic part of improving task performance
0:02:06clearly then ugly propositional content and the interactive content does not suffice
0:02:12when a parent well we're what what's a computational model of so should all in
0:02:16task context
0:02:18and basically we have investigated one of the most important roles and it's one filled
0:02:22by so fast and that is to build the bond between two people
0:02:26a one that is strong enough
0:02:28to allow people to build trust are with another person are not case without within
0:02:32the to compute the agent
0:02:34also we thought of as one as rubber or
0:02:37all the feeling of connection and how many with another
0:02:40and the sentence human this work is to develop a dialogue system which can facilitate
0:02:44that in the wasn't balloons with users all interactions in a long time
0:02:51rubber have been shown to have a good effect in fields such as education and
0:02:55that was you should and in fact upright a local actually develop i-th the adding
0:03:00computation model would suggest how interlocutors manager or two using specific conversational strategies which for
0:03:07which one concern a intermediate goals of rapport
0:03:13the foundation well by spencer only actually conceptualise is the interpersonal interface at the desired
0:03:20to be approved of once a positive traits and reducing studies like brace
0:03:27what to have been based management
0:03:32a private possible that's that interlocutor the what time and to increase coordination and by
0:03:38adhering to be here expectations
0:03:40which are guided by a more source about don's in the beginning state of the
0:03:44interaction and when a did i get snow each other it's more at the mind
0:03:49body interpersonal norms of the interaction
0:03:53so i fast age gender but was norms maybe work was divided well on the
0:03:57data the other person be here expectations
0:04:00on the other hand
0:04:02shared experience
0:04:03also allows
0:04:05to increase correlation between the two people
0:04:08by because people getting next their common history when they are definitely shared experience
0:04:14but like cementing the sense that people are part of the same unifying two
0:04:20and finally to better learn about the other person usual attentiveness is an important role
0:04:25obviously in our own corpus that mutual attentiveness is of fulfil i the strategy of
0:04:31self disclosure
0:04:33i the relation to perceive these that will become more intimate in nature
0:04:40the goal of this work that you're the coolest one understand the very nature of
0:04:44these conversational strategies by correlating them with a multimodal cues and are a man better
0:04:50article question is to leverage that i was standing to automatically recognize these strategies
0:04:55it can be implemented in a dialogue system
0:05:02so our corpus
0:05:04is the reciprocal peer tutoring corpus which was collected from twelve american english speaking kids
0:05:10who interact there were five weeks in a total of sixty sessions
0:05:14on an algebra topic
0:05:17and are
0:05:18part was demonstrates that there's tremendous amount of rapport building in this your dream context
0:05:24and this is a context to study the attic so social interaction
0:05:29which also had a one week das talcum so the trying to solve the problem
0:05:32of algebra or five weeks
0:05:38let's move the method
0:05:40as a prior work on detecting similar dialogue phenomena such as that of a slower
0:05:44and so it's one violation has i dialogue act modalities in isolation
0:05:49or has focused on like slowly data driven approaches the for instance one way to
0:05:54quantify a violation of a social norm is to see wendy language is different from
0:06:00the rest of the language in the dialogue so for example use of a cross
0:06:04entropy value
0:06:05twenty five is
0:06:06in a local recapture
0:06:08a richer variety of the sub categories of these conversational strategies
0:06:13and the maybe that is
0:06:15we construct rely the annotated corpus we cannot rely extensively official views
0:06:21on like using psychology give some psychology one a stand what strategies contribute to interperson
0:06:29closeness and then we asked three to five human raters to annotate buddies
0:06:33and computing the reliability so self disclosure here
0:06:37in our work
0:06:38was defined as well but expressions are which are used by people to really aspects
0:06:42of them that's to the other so we can present it into two types which
0:06:46is a enduring states
0:06:48which will be long
0:06:50and intimate aspect the ones that which of course and user to a very important
0:06:54within the context of a conversation also that would be done in a couple of
0:06:58mike that's
0:06:59but it also be once upon proceed actions which are socially unacceptable actions
0:07:05which are you know way
0:07:07we have other people feel better than the colours in but like i didn't value
0:07:10of the pretest a result with those negative numbers
0:07:13rf in the shared experience
0:07:16is an important way of showing that are the two people in a dyad have
0:07:19known each other
0:07:20and the getting that some commonality
0:07:22so we got we differentiated into sharing experiences outside the experiment an inside experiment
0:07:29for praise we had board label pretty that are labeled praise so this is an
0:07:33example of a label rate which is
0:07:35a great job with those negative numbers
0:07:37but it also be something like good job affect
0:07:40and finally
0:07:42also nominations are basically behaviors which go against
0:07:46generally accepted
0:07:48or steering wheel behaviors
0:07:50and the first pass decoder but actions one source-normalization and in the second pass
0:07:56we differentiated these categories which was a breaking the three rules which could be doing
0:08:01it off task a talk during tutoring attribute based writing acts like criticising in solving
0:08:07a teasing
0:08:08i don't also be referring to one
0:08:11your own or others social modulation
0:08:14right now focusing on the need to work and so on
0:08:17discourse relations actually signals of the guided coming closer and no longer feel obligated
0:08:23to adhere to the norms of the larger bow
0:08:28this is an example of
0:08:33impact of self disclosure in one of the dyads where like even says what we
0:08:38want to be when you want to dwell which is the eliciting self disclosure and
0:08:43we use a they don't know yet than anyone us that i want to be
0:08:46a chef
0:08:48and then the data was on and
0:08:51you say that a lot of like seven is larger than a book you wouldn't
0:08:53be in the middle and then be lost like actually me and never know thinking
0:08:57of making the you channel with completely off
0:09:00from this idea being a chef
0:09:04but however
0:09:05e he goes onto aspect you know
0:09:08you channels will make money
0:09:10and then a few done say to use as you know if anything you are
0:09:14making one
0:09:15i will reminder of which would be fine
0:09:17so that all back and forth and mean elicitation of l one and
0:09:21a cellular which is done by the other person
0:09:25your some other examples of violation of social norms so the top one is i
0:09:29that a friend dyads which was in which was or seem to be in high
0:09:34rubber which is
0:09:37so you want exactly that you're and beat with that ut in the top interaction
0:09:42and once as you can do that that's the whole point
0:09:45you say that hey you are probably never do that and then once said that's
0:09:48why are you doing you it might so that you're smiles and we just as
0:09:52you almost
0:09:53my gosh we never the that ever
0:09:57so basically this what the friend i and smiling in one very important background that
0:10:02we found across it does not even when friends do was a limitation always preceded
0:10:07or
0:10:09with this might or might always smaller than one for some additional colours
0:10:12which is one of kind of hedging ugly these violations
0:10:16i and the bottom example is actually strangers what perceived to be in europe or
0:10:23so here we use as a next problem is exactly the same is my any
0:10:28then that's was that you get what the problem and then they don't you have
0:10:33and then p two with that you know who overlap and says that serious exactly
0:10:38so this that was perceived to be in europe or and that strangers
0:10:42was being a selection of social number not be the best idea
0:10:45when this to forming a relationship
0:10:51we didn't go to for
0:10:53we will behaviors which are independent variables in this study so we have it is
0:10:58we have smiles and head nods
0:11:00and where it is we have created a partner
0:11:03that what you were doing using it what we bought very doing
0:11:06and then using as their in the room
0:11:12so the next up here is to
0:11:14understand
0:11:15like what you
0:11:18if the user when the
0:11:20you these conversational strategies to that extent we first undersampled in on annotated a set
0:11:25of on these were conversational strategies to create about in the dataset
0:11:29and the non annotated utterances were randomly generated
0:11:32so the final corpus consists of
0:11:35a house an example the sentence larger i don't want to examine the fate experience
0:11:40one sixty seven example the phrase
0:11:42and around ten thousand five an example of violation of those wrong
0:11:45then what that the bra sixty
0:11:47interaction sessions which is
0:11:49sixty one and how far interaction sessions
0:11:54in the next step we explored observable in verbal and vocal behaviors of interest
0:12:00we are drawn from a quantitative analysis
0:12:03so we used to work on twenty five i'll be able to use of interest
0:12:07and then use all can smile twenty five some simple low-level descriptors
0:12:11related to pitch loudness and the vocal quality and assess whether the mean value of
0:12:16these features are significantly different
0:12:18in utterances
0:12:19that were annotated the modifications are the end of a not out of eight
0:12:23with a conversational strategy and the side effects a stochastic generalisability
0:12:29and finally for visual behaviors and nonverbal behaviors we explore whether there are all operating
0:12:34with these conversational strategies and they look at the altar accuracy
0:12:39d quadrants like people
0:12:43the based on the statistical analysis we select which might be more to use to
0:12:47include in a machine learning model so we have three sets of features are the
0:12:51first because that is basically were able which will and will
0:12:55use of the input in the current down
0:12:59and in addition to that we also added to capture some context we also added
0:13:02some type of words we select bigrams
0:13:04you part-of-speech bigrams and the word part-of-speech pairs
0:13:07a feature set to is the listener behavior basically
0:13:11so what is the visual behaviour of the listener when important using a conversational strategies
0:13:15that's we just the two
0:13:17and features that we use to capture more context around the users of conversational strategy
0:13:21so features entry is one thing to the goodwin the previous turn
0:13:24the what was is what will clean visible
0:13:27expression
0:13:33we used and to regularize logistic regression as the training of all the pure and
0:13:38the estimated informants using accuracy and accuracy over chance
0:13:42and then the competitors some standard a very basic machine learning algorithms
0:13:49okay so let's move on to the results
0:13:52the ones that you article goal
0:13:53all of on understanding the nature of conversational strategies
0:13:59here are the results for these statistical analysis of multimodal cues was the disposal first
0:14:04also we found that when students ref for so we found that students effort significantly
0:14:07more onto their possible constant during the disclosure and we gotta talk about what the
0:14:12likes and dislikes
0:14:15the new categories of positive emotion what the negative emotional it also had a i
0:14:19effect size
0:14:22also we standardized look very but of what ethnicity
0:14:25which form light of the intuition that when people reveal themselves you know not handy
0:14:29are honest way they are more
0:14:31i come below one what are able
0:14:33we did that this way to report any city and it had a higher rate
0:14:36as well
0:14:37for acoustic features we found
0:14:40a moderate effect size for loudness
0:14:42in this mode utterances
0:14:45and this
0:14:45so
0:14:46are examination of the corpus we often found that
0:14:51like speakers often not excited are when the disclosed in the dialogue like or twenty
0:14:55it is not something fun suppressing about themselves
0:14:58the of in spoken lower voice
0:15:00when they were talking only negative about themselves
0:15:02so it the variation in pitch was not significantly over the only the loudness
0:15:09for which will you be found that the four types of gave since my where
0:15:12significantly more likely to operate in utterances of the let's go to compare two nonsense
0:15:17words or utterances
0:15:19with using it partner
0:15:21which had the highest effect size
0:15:24from a similar analysis for the listener but a good look at those details in
0:15:28the paper
0:15:30of a shared experience we look that affiliation driving time orientation what's one the book
0:15:36which it was only used by the close to a index commonality with an been
0:15:41within a given time frame where all that we do we make some kind of
0:15:44affiliation with the conversation partner
0:15:46and it wasn't was to affect size for both of them
0:15:49and
0:15:50like first wasn't obviously
0:15:52had a high effective include rapid whatever of and cultivation about that
0:15:59so the north a visual cues was similar to that of the twist motion
0:16:06next we look that creates
0:16:07all systems brain one for
0:16:10well billboards vision that increases the interlocutor's hundred and perhaps that if a k c
0:16:15i will have a positive tone of voice is a very intuitive and the war
0:16:19the a positive effect is what that
0:16:24we also look at some of the acoustic features here and we had a negative
0:16:27effect i swear loudness actually so people ls lower when they raise the partner
0:16:34and of them or at side effect is what will be quality features
0:16:41finally for source-normalization we looked at different categories of all asked all other things belonging
0:16:47to social categories or
0:16:50there was no concerns
0:16:52and the was present of a class about their
0:16:54we also
0:16:56it can capture the intuition that some signals in the language
0:17:00a puzzle slow modulation
0:17:02would stem from just putting one student in that you roll but address the problem
0:17:06you in context where one of the cuban one of the beauty
0:17:08and the change does
0:17:10so
0:17:11we also do better look at the power drive there was a small it was
0:17:15significant what the effect i was small
0:17:19and finally listened via well which has found a in we use you that are
0:17:23forced wasn't you're
0:17:25to be an indicator of high stages
0:17:27and in with user diverse wasn't singular
0:17:30to be a good predictor of lower stages
0:17:32we bought and the twins ones do a lot so implementation then we just there
0:17:35are more likely to make three statements which involve others
0:17:38so for first wasn't rule
0:17:41that it was significantly higher in source-normalization utterances what effect it was a small
0:17:48for acoustic features
0:17:50we had a positive effect size for the which the fun
0:17:53the loudness and the vocal quality features
0:18:03for the visual cues that would say that wanting one additional thing there was significant
0:18:07in for some additional head-nodding are we not finding the previous conversations are used to
0:18:11speakers where
0:18:12a more likely to had not when they were doing a violation of a social
0:18:16norm
0:18:19so then using these features or inform
0:18:21from these days as if they actually
0:18:24what them in the machine learning model any found
0:18:27logistic regression to outperform the other basic machine learning algorithms
0:18:30and b
0:18:31the accuracy or johns
0:18:33ranged from sixty to eighty percent for detection of these for a categories
0:18:39maybe weekly likely just go to the most predictive features which are more interesting than
0:18:44t
0:18:44like accuracy numbers
0:18:46so
0:18:48so in features that one
0:18:50which is
0:18:51this people behave in the current one
0:18:53we found and because that is close they are then to their partner
0:18:57by gazing at them
0:18:58and head-nodding pre-emphasized what they're saying
0:19:01a did not get of their own on the part that's worksheet
0:19:05and first person singular responsibly predictive however the effect that the machine learning wanna picks
0:19:10up for first one thing with much less there
0:19:12competitive model would be to indicating the importance of normal this in
0:19:17well while doing his conversational strategies
0:19:20listeners on the other hand respond
0:19:24during the current done by head-nodding to communicate their attention and giving and the speaker
0:19:29but not of the worksheet
0:19:32and in the previous turn
0:19:33it but there is that's like it is my and not and how well
0:19:38no or loudness in voice
0:19:42a four shared experience some of the most pretty if features
0:19:46included using a their own worksheet like the speaker the less likely to get at
0:19:51their own watch it all the integrated worksheet that i could have lower schumann voice
0:19:58a however affiliation driving time in addition what would have only two categories got here
0:20:02that was really pretty of shared experience
0:20:07i listeners
0:20:08on the other hand exhibited be a bit like smiling or have to indicate appreciation
0:20:13of the content of the tall or anything you could one
0:20:17also them but that is not a more likely to be elsewhere or at the
0:20:20speaker why the speaker is doing a shared experience
0:20:23but we are less likely do not
0:20:24and b that their own worksheet
0:20:28and finally in the previous done on the part of the last like to smile
0:20:32and gaze at their own worksheet
0:20:34and have a lower loudness in voice
0:20:37if the partner to the next most one which had experience
0:20:41operators
0:20:43the most predictive features because doing a print was giving a the buttons worksheet
0:20:47which route indicative of directing attention to what the speaker what part is doing well
0:20:51breathing him
0:20:53head-nodding with a positive tone of voice
0:20:55perhaps emphasize the praise
0:20:57smiling perhaps as an indication of a general appreciation
0:21:03artemis again the potential embarrassment of race
0:21:11also
0:21:12we got features for the listeners
0:21:15included
0:21:17head-nodding or back channeling an acknowledgement
0:21:21and in the previous turn
0:21:23you partner
0:21:24was more like use my
0:21:29and finally for source-normalization we found that the most predictive feature
0:21:34from the speaker's behaviour in the parent and you're accusing any part or smiling with
0:21:38my head nodding
0:21:40and private actually found that
0:21:43smiling is not only hitting any gettable it's all the time easement
0:21:47a display of appeasement
0:21:49and it's signal that you're of attitude at between source normalisation
0:21:53which is more likely to
0:21:54probable forgiveness one the interlocutor
0:21:57so
0:21:59thing in the interest of time i just about one or two implications of a
0:22:02lock
0:22:05then as well
0:22:07we identified some regularity the social interaction
0:22:12and we use might be more because reflectees conversational strategies
0:22:17a and e that applicable across a wide range of the mean because this mapping
0:22:20you know more generally also can apply to your to bring as well as
0:22:24what things like about of the for clinical decisions of words one
0:22:30and that some of you might have seen yesterday of these findings have been integrated
0:22:33into always of the system call sara
0:22:37which takes input in real time
0:22:39to detect conversational strategies
0:22:42feed it into the rapport estimated to estimate acquired level of rapport
0:22:48reasons
0:22:48about the source light intent
0:22:50and then generates behavioral all the form of a lot along with the interactions
0:23:00this time limitations of the work was that we use the valence in it
0:23:04and we would like to work with a more natural distribution only on the contrary
0:23:08and deal with this but you could machine learning method which you don't methods
0:23:13and the other one piece of that when we look at multimodal features instead of
0:23:16looking at them in isolation better to exploit the dependency of the correlation between different
0:23:21of each of the temporal contingency so it can look at it people for that
0:23:26i don't triangles of like these findings to build rapport align you
0:23:33the finally in conclusion
0:23:35we learn the discriminating power in general activity appears features
0:23:42speakers
0:23:42just are not you results in a shot
0:23:45speaker is usually accompanied it is crucial
0:23:47information we had not anything other partner
0:23:51listeners do not but the about their gaze
0:23:56also shared experience because the less likely just by and more likely to of or
0:24:00the gaze
0:24:01meanwhile listeners my signal in coordination
0:24:09the so that they were and happens to justine al and a and b
0:24:14and i think that
0:24:15and also the what it really could be that would you put this work
0:24:27we have done for one question
0:24:30basically
0:24:36i have in that a question about the term conversational strategy so i know we
0:24:40use it i've used it my own papers to that i was listening to you
0:24:44speaker that and thinking gosh it sure implies some kind of conscious intentionality about how
0:24:49i'm gonna approach the dialogue and it's unlikely that that's really was happening
0:24:54so i wonder if you when you're colours there's had discussions about what the caller
0:24:58and what really alternatives to conversational strategy that you might are considered
0:25:03well i think one of the was things was
0:25:06like thinking in terms of like
0:25:08this the first part of speech acts
0:25:10and
0:25:11so speech acts
0:25:12so the different we wanted to start again speech that these in my understanding is
0:25:15that
0:25:16on a six sre scan span more than one speech or
0:25:19and it's
0:25:21it's more about the illocutionary force of the utterances that morlet pragmatic rather than the
0:25:24actual
0:25:25x amount take what the linguistic content
0:25:28so that one reason for not
0:25:29right quality that make the move or speech-act what whatever conversational strategy
0:25:33but what also actually i've seen some work including where you have a taxonomy of
0:25:38dialog category
0:25:40and conversational strategy is
0:25:43is perhaps
0:25:44using the more complicated within actually we are doing so it it's
0:25:47it's more it it's more like
0:25:49something which can be inferred
0:25:51rather than
0:25:53like and ready narrative clause level as we are doing what do not is like