or a graph to everybody
implement
animal but student at cmu with justine
and i'm going to describe our work on
automatic recognition of
social conversational strategies
which contribute to building maintaining a sometimes destroying of lighting relationship us to specifically these
conversation strategies are reported things like self disclosure shared experiences a prisons go on
let's begin with the motivation of the talk
a speaker of course you multiple conversation goes in a dialogue and contributions low conversation
can often be divided into a
the like those that one for robust functions those that will fill interaction functions like
turn taking and those that fulfill a in that wasn't functions
which manage the relationship between the interlocutor's over time
in the category of all that fulfil these in that wasn't functions are a conversation
strategies
which a particular we do nothing
and i don't have an impact on
the relationship between the two individuals
so in this well we propose a technique to model and automatically recognize these conversational
strategies
from like using multimodal information the we use
well the visual and the vocal modalities of the speaker as well as the interlocutor
in the current and the previous done
and we believe that it's important
i as more natural conversations with dialogue systems become part of you closed at like
to believe that the martin for on advancing the capability of the dialogue systems not
only do
they convey information energy was moved interaction
but also manage long-term interactions by building intimacy and rapport
not just for the sake of companionship
but at the more intrinsic part of improving task performance
clearly then ugly propositional content and the interactive content does not suffice
when a parent well we're what what's a computational model of so should all in
task context
and basically we have investigated one of the most important roles and it's one filled
by so fast and that is to build the bond between two people
a one that is strong enough
to allow people to build trust are with another person are not case without within
the to compute the agent
also we thought of as one as rubber or
all the feeling of connection and how many with another
and the sentence human this work is to develop a dialogue system which can facilitate
that in the wasn't balloons with users all interactions in a long time
rubber have been shown to have a good effect in fields such as education and
that was you should and in fact upright a local actually develop i-th the adding
computation model would suggest how interlocutors manager or two using specific conversational strategies which for
which one concern a intermediate goals of rapport
the foundation well by spencer only actually conceptualise is the interpersonal interface at the desired
to be approved of once a positive traits and reducing studies like brace
what to have been based management
a private possible that's that interlocutor the what time and to increase coordination and by
adhering to be here expectations
which are guided by a more source about don's in the beginning state of the
interaction and when a did i get snow each other it's more at the mind
body interpersonal norms of the interaction
so i fast age gender but was norms maybe work was divided well on the
data the other person be here expectations
on the other hand
shared experience
also allows
to increase correlation between the two people
by because people getting next their common history when they are definitely shared experience
but like cementing the sense that people are part of the same unifying two
and finally to better learn about the other person usual attentiveness is an important role
obviously in our own corpus that mutual attentiveness is of fulfil i the strategy of
self disclosure
i the relation to perceive these that will become more intimate in nature
the goal of this work that you're the coolest one understand the very nature of
these conversational strategies by correlating them with a multimodal cues and are a man better
article question is to leverage that i was standing to automatically recognize these strategies
it can be implemented in a dialogue system
so our corpus
is the reciprocal peer tutoring corpus which was collected from twelve american english speaking kids
who interact there were five weeks in a total of sixty sessions
on an algebra topic
and are
part was demonstrates that there's tremendous amount of rapport building in this your dream context
and this is a context to study the attic so social interaction
which also had a one week das talcum so the trying to solve the problem
of algebra or five weeks
let's move the method
as a prior work on detecting similar dialogue phenomena such as that of a slower
and so it's one violation has i dialogue act modalities in isolation
or has focused on like slowly data driven approaches the for instance one way to
quantify a violation of a social norm is to see wendy language is different from
the rest of the language in the dialogue so for example use of a cross
entropy value
twenty five is
in a local recapture
a richer variety of the sub categories of these conversational strategies
and the maybe that is
we construct rely the annotated corpus we cannot rely extensively official views
on like using psychology give some psychology one a stand what strategies contribute to interperson
closeness and then we asked three to five human raters to annotate buddies
and computing the reliability so self disclosure here
in our work
was defined as well but expressions are which are used by people to really aspects
of them that's to the other so we can present it into two types which
is a enduring states
which will be long
and intimate aspect the ones that which of course and user to a very important
within the context of a conversation also that would be done in a couple of
mike that's
but it also be once upon proceed actions which are socially unacceptable actions
which are you know way
we have other people feel better than the colours in but like i didn't value
of the pretest a result with those negative numbers
rf in the shared experience
is an important way of showing that are the two people in a dyad have
known each other
and the getting that some commonality
so we got we differentiated into sharing experiences outside the experiment an inside experiment
for praise we had board label pretty that are labeled praise so this is an
example of a label rate which is
a great job with those negative numbers
but it also be something like good job affect
and finally
also nominations are basically behaviors which go against
generally accepted
or steering wheel behaviors
and the first pass decoder but actions one source-normalization and in the second pass
we differentiated these categories which was a breaking the three rules which could be doing
it off task a talk during tutoring attribute based writing acts like criticising in solving
a teasing
i don't also be referring to one
your own or others social modulation
right now focusing on the need to work and so on
discourse relations actually signals of the guided coming closer and no longer feel obligated
to adhere to the norms of the larger bow
this is an example of
impact of self disclosure in one of the dyads where like even says what we
want to be when you want to dwell which is the eliciting self disclosure and
we use a they don't know yet than anyone us that i want to be
a chef
and then the data was on and
you say that a lot of like seven is larger than a book you wouldn't
be in the middle and then be lost like actually me and never know thinking
of making the you channel with completely off
from this idea being a chef
but however
e he goes onto aspect you know
you channels will make money
and then a few done say to use as you know if anything you are
making one
i will reminder of which would be fine
so that all back and forth and mean elicitation of l one and
a cellular which is done by the other person
your some other examples of violation of social norms so the top one is i
that a friend dyads which was in which was or seem to be in high
rubber which is
so you want exactly that you're and beat with that ut in the top interaction
and once as you can do that that's the whole point
you say that hey you are probably never do that and then once said that's
why are you doing you it might so that you're smiles and we just as
you almost
my gosh we never the that ever
so basically this what the friend i and smiling in one very important background that
we found across it does not even when friends do was a limitation always preceded
or
with this might or might always smaller than one for some additional colours
which is one of kind of hedging ugly these violations
i and the bottom example is actually strangers what perceived to be in europe or
so here we use as a next problem is exactly the same is my any
then that's was that you get what the problem and then they don't you have
and then p two with that you know who overlap and says that serious exactly
so this that was perceived to be in europe or and that strangers
was being a selection of social number not be the best idea
when this to forming a relationship
we didn't go to for
we will behaviors which are independent variables in this study so we have it is
we have smiles and head nods
and where it is we have created a partner
that what you were doing using it what we bought very doing
and then using as their in the room
so the next up here is to
understand
like what you
if the user when the
you these conversational strategies to that extent we first undersampled in on annotated a set
of on these were conversational strategies to create about in the dataset
and the non annotated utterances were randomly generated
so the final corpus consists of
a house an example the sentence larger i don't want to examine the fate experience
one sixty seven example the phrase
and around ten thousand five an example of violation of those wrong
then what that the bra sixty
interaction sessions which is
sixty one and how far interaction sessions
in the next step we explored observable in verbal and vocal behaviors of interest
we are drawn from a quantitative analysis
so we used to work on twenty five i'll be able to use of interest
and then use all can smile twenty five some simple low-level descriptors
related to pitch loudness and the vocal quality and assess whether the mean value of
these features are significantly different
in utterances
that were annotated the modifications are the end of a not out of eight
with a conversational strategy and the side effects a stochastic generalisability
and finally for visual behaviors and nonverbal behaviors we explore whether there are all operating
with these conversational strategies and they look at the altar accuracy
d quadrants like people
the based on the statistical analysis we select which might be more to use to
include in a machine learning model so we have three sets of features are the
first because that is basically were able which will and will
use of the input in the current down
and in addition to that we also added to capture some context we also added
some type of words we select bigrams
you part-of-speech bigrams and the word part-of-speech pairs
a feature set to is the listener behavior basically
so what is the visual behaviour of the listener when important using a conversational strategies
that's we just the two
and features that we use to capture more context around the users of conversational strategy
so features entry is one thing to the goodwin the previous turn
the what was is what will clean visible
expression
we used and to regularize logistic regression as the training of all the pure and
the estimated informants using accuracy and accuracy over chance
and then the competitors some standard a very basic machine learning algorithms
okay so let's move on to the results
the ones that you article goal
all of on understanding the nature of conversational strategies
here are the results for these statistical analysis of multimodal cues was the disposal first
also we found that when students ref for so we found that students effort significantly
more onto their possible constant during the disclosure and we gotta talk about what the
likes and dislikes
the new categories of positive emotion what the negative emotional it also had a i
effect size
also we standardized look very but of what ethnicity
which form light of the intuition that when people reveal themselves you know not handy
are honest way they are more
i come below one what are able
we did that this way to report any city and it had a higher rate
as well
for acoustic features we found
a moderate effect size for loudness
in this mode utterances
and this
so
are examination of the corpus we often found that
like speakers often not excited are when the disclosed in the dialogue like or twenty
it is not something fun suppressing about themselves
the of in spoken lower voice
when they were talking only negative about themselves
so it the variation in pitch was not significantly over the only the loudness
for which will you be found that the four types of gave since my where
significantly more likely to operate in utterances of the let's go to compare two nonsense
words or utterances
with using it partner
which had the highest effect size
from a similar analysis for the listener but a good look at those details in
the paper
of a shared experience we look that affiliation driving time orientation what's one the book
which it was only used by the close to a index commonality with an been
within a given time frame where all that we do we make some kind of
affiliation with the conversation partner
and it wasn't was to affect size for both of them
and
like first wasn't obviously
had a high effective include rapid whatever of and cultivation about that
so the north a visual cues was similar to that of the twist motion
next we look that creates
all systems brain one for
well billboards vision that increases the interlocutor's hundred and perhaps that if a k c
i will have a positive tone of voice is a very intuitive and the war
the a positive effect is what that
we also look at some of the acoustic features here and we had a negative
effect i swear loudness actually so people ls lower when they raise the partner
and of them or at side effect is what will be quality features
finally for source-normalization we looked at different categories of all asked all other things belonging
to social categories or
there was no concerns
and the was present of a class about their
we also
it can capture the intuition that some signals in the language
a puzzle slow modulation
would stem from just putting one student in that you roll but address the problem
you in context where one of the cuban one of the beauty
and the change does
so
we also do better look at the power drive there was a small it was
significant what the effect i was small
and finally listened via well which has found a in we use you that are
forced wasn't you're
to be an indicator of high stages
and in with user diverse wasn't singular
to be a good predictor of lower stages
we bought and the twins ones do a lot so implementation then we just there
are more likely to make three statements which involve others
so for first wasn't rule
that it was significantly higher in source-normalization utterances what effect it was a small
for acoustic features
we had a positive effect size for the which the fun
the loudness and the vocal quality features
for the visual cues that would say that wanting one additional thing there was significant
in for some additional head-nodding are we not finding the previous conversations are used to
speakers where
a more likely to had not when they were doing a violation of a social
norm
so then using these features or inform
from these days as if they actually
what them in the machine learning model any found
logistic regression to outperform the other basic machine learning algorithms
and b
the accuracy or johns
ranged from sixty to eighty percent for detection of these for a categories
maybe weekly likely just go to the most predictive features which are more interesting than
t
like accuracy numbers
so
so in features that one
which is
this people behave in the current one
we found and because that is close they are then to their partner
by gazing at them
and head-nodding pre-emphasized what they're saying
a did not get of their own on the part that's worksheet
and first person singular responsibly predictive however the effect that the machine learning wanna picks
up for first one thing with much less there
competitive model would be to indicating the importance of normal this in
well while doing his conversational strategies
listeners on the other hand respond
during the current done by head-nodding to communicate their attention and giving and the speaker
but not of the worksheet
and in the previous turn
it but there is that's like it is my and not and how well
no or loudness in voice
a four shared experience some of the most pretty if features
included using a their own worksheet like the speaker the less likely to get at
their own watch it all the integrated worksheet that i could have lower schumann voice
a however affiliation driving time in addition what would have only two categories got here
that was really pretty of shared experience
i listeners
on the other hand exhibited be a bit like smiling or have to indicate appreciation
of the content of the tall or anything you could one
also them but that is not a more likely to be elsewhere or at the
speaker why the speaker is doing a shared experience
but we are less likely do not
and b that their own worksheet
and finally in the previous done on the part of the last like to smile
and gaze at their own worksheet
and have a lower loudness in voice
if the partner to the next most one which had experience
operators
the most predictive features because doing a print was giving a the buttons worksheet
which route indicative of directing attention to what the speaker what part is doing well
breathing him
head-nodding with a positive tone of voice
perhaps emphasize the praise
smiling perhaps as an indication of a general appreciation
artemis again the potential embarrassment of race
also
we got features for the listeners
included
head-nodding or back channeling an acknowledgement
and in the previous turn
you partner
was more like use my
and finally for source-normalization we found that the most predictive feature
from the speaker's behaviour in the parent and you're accusing any part or smiling with
my head nodding
and private actually found that
smiling is not only hitting any gettable it's all the time easement
a display of appeasement
and it's signal that you're of attitude at between source normalisation
which is more likely to
probable forgiveness one the interlocutor
so
thing in the interest of time i just about one or two implications of a
lock
then as well
we identified some regularity the social interaction
and we use might be more because reflectees conversational strategies
a and e that applicable across a wide range of the mean because this mapping
you know more generally also can apply to your to bring as well as
what things like about of the for clinical decisions of words one
and that some of you might have seen yesterday of these findings have been integrated
into always of the system call sara
which takes input in real time
to detect conversational strategies
feed it into the rapport estimated to estimate acquired level of rapport
reasons
about the source light intent
and then generates behavioral all the form of a lot along with the interactions
this time limitations of the work was that we use the valence in it
and we would like to work with a more natural distribution only on the contrary
and deal with this but you could machine learning method which you don't methods
and the other one piece of that when we look at multimodal features instead of
looking at them in isolation better to exploit the dependency of the correlation between different
of each of the temporal contingency so it can look at it people for that
i don't triangles of like these findings to build rapport align you
the finally in conclusion
we learn the discriminating power in general activity appears features
speakers
just are not you results in a shot
speaker is usually accompanied it is crucial
information we had not anything other partner
listeners do not but the about their gaze
also shared experience because the less likely just by and more likely to of or
the gaze
meanwhile listeners my signal in coordination
the so that they were and happens to justine al and a and b
and i think that
and also the what it really could be that would you put this work
we have done for one question
basically
i have in that a question about the term conversational strategy so i know we
use it i've used it my own papers to that i was listening to you
speaker that and thinking gosh it sure implies some kind of conscious intentionality about how
i'm gonna approach the dialogue and it's unlikely that that's really was happening
so i wonder if you when you're colours there's had discussions about what the caller
and what really alternatives to conversational strategy that you might are considered
well i think one of the was things was
like thinking in terms of like
this the first part of speech acts
and
so speech acts
so the different we wanted to start again speech that these in my understanding is
that
on a six sre scan span more than one speech or
and it's
it's more about the illocutionary force of the utterances that morlet pragmatic rather than the
actual
x amount take what the linguistic content
so that one reason for not
right quality that make the move or speech-act what whatever conversational strategy
but what also actually i've seen some work including where you have a taxonomy of
dialog category
and conversational strategy is
is perhaps
using the more complicated within actually we are doing so it it's
it's more it it's more like
something which can be inferred
rather than
like and ready narrative clause level as we are doing what do not is like