Speech Transcript - Automatic Recognition of Conversational Strategies in the Service of a Socially-Aware Dialog System

or a graph to everybody

implement

animal but student at cmu with justine

and i'm going to describe our work on

automatic recognition of

social conversational strategies

which contribute to building maintaining a sometimes destroying of lighting relationship us to specifically these

conversation strategies are reported things like self disclosure shared experiences a prisons go on

let's begin with the motivation of the talk

a speaker of course you multiple conversation goes in a dialogue and contributions low conversation

can often be divided into a

the like those that one for robust functions those that will fill interaction functions like

turn taking and those that fulfill a in that wasn't functions

which manage the relationship between the interlocutor's over time

in the category of all that fulfil these in that wasn't functions are a conversation

strategies

which a particular we do nothing

and i don't have an impact on

the relationship between the two individuals

so in this well we propose a technique to model and automatically recognize these conversational

strategies

from like using multimodal information the we use

well the visual and the vocal modalities of the speaker as well as the interlocutor

in the current and the previous done

and we believe that it's important

i as more natural conversations with dialogue systems become part of you closed at like

to believe that the martin for on advancing the capability of the dialogue systems not

only do

they convey information energy was moved interaction

but also manage long-term interactions by building intimacy and rapport

not just for the sake of companionship

but at the more intrinsic part of improving task performance

clearly then ugly propositional content and the interactive content does not suffice

when a parent well we're what what's a computational model of so should all in

task context

and basically we have investigated one of the most important roles and it's one filled

by so fast and that is to build the bond between two people

a one that is strong enough

to allow people to build trust are with another person are not case without within

the to compute the agent

also we thought of as one as rubber or

all the feeling of connection and how many with another

and the sentence human this work is to develop a dialogue system which can facilitate

that in the wasn't balloons with users all interactions in a long time

rubber have been shown to have a good effect in fields such as education and

that was you should and in fact upright a local actually develop i-th the adding

computation model would suggest how interlocutors manager or two using specific conversational strategies which for

which one concern a intermediate goals of rapport

the foundation well by spencer only actually conceptualise is the interpersonal interface at the desired

to be approved of once a positive traits and reducing studies like brace

what to have been based management

a private possible that's that interlocutor the what time and to increase coordination and by

adhering to be here expectations

which are guided by a more source about don's in the beginning state of the

interaction and when a did i get snow each other it's more at the mind

body interpersonal norms of the interaction

so i fast age gender but was norms maybe work was divided well on the

data the other person be here expectations

on the other hand

shared experience

also allows

to increase correlation between the two people

by because people getting next their common history when they are definitely shared experience

but like cementing the sense that people are part of the same unifying two

and finally to better learn about the other person usual attentiveness is an important role

obviously in our own corpus that mutual attentiveness is of fulfil i the strategy of

self disclosure

i the relation to perceive these that will become more intimate in nature

the goal of this work that you're the coolest one understand the very nature of

these conversational strategies by correlating them with a multimodal cues and are a man better

article question is to leverage that i was standing to automatically recognize these strategies

it can be implemented in a dialogue system

so our corpus

is the reciprocal peer tutoring corpus which was collected from twelve american english speaking kids

who interact there were five weeks in a total of sixty sessions

on an algebra topic

and are

part was demonstrates that there's tremendous amount of rapport building in this your dream context

and this is a context to study the attic so social interaction

which also had a one week das talcum so the trying to solve the problem

of algebra or five weeks

let's move the method

as a prior work on detecting similar dialogue phenomena such as that of a slower

and so it's one violation has i dialogue act modalities in isolation

or has focused on like slowly data driven approaches the for instance one way to

quantify a violation of a social norm is to see wendy language is different from

the rest of the language in the dialogue so for example use of a cross

entropy value

twenty five is

in a local recapture

a richer variety of the sub categories of these conversational strategies

and the maybe that is

we construct rely the annotated corpus we cannot rely extensively official views

on like using psychology give some psychology one a stand what strategies contribute to interperson

closeness and then we asked three to five human raters to annotate buddies

and computing the reliability so self disclosure here

in our work

was defined as well but expressions are which are used by people to really aspects

of them that's to the other so we can present it into two types which

is a enduring states

which will be long

and intimate aspect the ones that which of course and user to a very important

within the context of a conversation also that would be done in a couple of

mike that's

but it also be once upon proceed actions which are socially unacceptable actions

which are you know way

we have other people feel better than the colours in but like i didn't value

of the pretest a result with those negative numbers

rf in the shared experience

is an important way of showing that are the two people in a dyad have

known each other

and the getting that some commonality

so we got we differentiated into sharing experiences outside the experiment an inside experiment

for praise we had board label pretty that are labeled praise so this is an

example of a label rate which is

a great job with those negative numbers

but it also be something like good job affect

and finally

also nominations are basically behaviors which go against

generally accepted

or steering wheel behaviors

and the first pass decoder but actions one source-normalization and in the second pass

we differentiated these categories which was a breaking the three rules which could be doing

it off task a talk during tutoring attribute based writing acts like criticising in solving

a teasing

i don't also be referring to one

your own or others social modulation

right now focusing on the need to work and so on

discourse relations actually signals of the guided coming closer and no longer feel obligated

to adhere to the norms of the larger bow

this is an example of

impact of self disclosure in one of the dyads where like even says what we

want to be when you want to dwell which is the eliciting self disclosure and

we use a they don't know yet than anyone us that i want to be

a chef

and then the data was on and

you say that a lot of like seven is larger than a book you wouldn't

be in the middle and then be lost like actually me and never know thinking

of making the you channel with completely off

from this idea being a chef

but however

e he goes onto aspect you know

you channels will make money

and then a few done say to use as you know if anything you are

making one

i will reminder of which would be fine

so that all back and forth and mean elicitation of l one and

a cellular which is done by the other person

your some other examples of violation of social norms so the top one is i

that a friend dyads which was in which was or seem to be in high

rubber which is

so you want exactly that you're and beat with that ut in the top interaction

and once as you can do that that's the whole point

you say that hey you are probably never do that and then once said that's

why are you doing you it might so that you're smiles and we just as

you almost

my gosh we never the that ever

so basically this what the friend i and smiling in one very important background that

we found across it does not even when friends do was a limitation always preceded

with this might or might always smaller than one for some additional colours

which is one of kind of hedging ugly these violations

i and the bottom example is actually strangers what perceived to be in europe or

so here we use as a next problem is exactly the same is my any

then that's was that you get what the problem and then they don't you have

and then p two with that you know who overlap and says that serious exactly

so this that was perceived to be in europe or and that strangers

was being a selection of social number not be the best idea

when this to forming a relationship

we didn't go to for

we will behaviors which are independent variables in this study so we have it is

we have smiles and head nods

and where it is we have created a partner

that what you were doing using it what we bought very doing

and then using as their in the room

so the next up here is to

understand

like what you

if the user when the

you these conversational strategies to that extent we first undersampled in on annotated a set

of on these were conversational strategies to create about in the dataset

and the non annotated utterances were randomly generated

so the final corpus consists of

a house an example the sentence larger i don't want to examine the fate experience

one sixty seven example the phrase

and around ten thousand five an example of violation of those wrong

then what that the bra sixty

interaction sessions which is

sixty one and how far interaction sessions

in the next step we explored observable in verbal and vocal behaviors of interest

we are drawn from a quantitative analysis

so we used to work on twenty five i'll be able to use of interest

and then use all can smile twenty five some simple low-level descriptors

related to pitch loudness and the vocal quality and assess whether the mean value of

these features are significantly different

in utterances

that were annotated the modifications are the end of a not out of eight

with a conversational strategy and the side effects a stochastic generalisability

and finally for visual behaviors and nonverbal behaviors we explore whether there are all operating

with these conversational strategies and they look at the altar accuracy

d quadrants like people

the based on the statistical analysis we select which might be more to use to

include in a machine learning model so we have three sets of features are the

first because that is basically were able which will and will

use of the input in the current down

and in addition to that we also added to capture some context we also added

some type of words we select bigrams

you part-of-speech bigrams and the word part-of-speech pairs

a feature set to is the listener behavior basically

so what is the visual behaviour of the listener when important using a conversational strategies

that's we just the two

and features that we use to capture more context around the users of conversational strategy

so features entry is one thing to the goodwin the previous turn

the what was is what will clean visible

expression

we used and to regularize logistic regression as the training of all the pure and

the estimated informants using accuracy and accuracy over chance

and then the competitors some standard a very basic machine learning algorithms

okay so let's move on to the results

the ones that you article goal

all of on understanding the nature of conversational strategies

here are the results for these statistical analysis of multimodal cues was the disposal first

also we found that when students ref for so we found that students effort significantly

more onto their possible constant during the disclosure and we gotta talk about what the

likes and dislikes

the new categories of positive emotion what the negative emotional it also had a i

effect size

also we standardized look very but of what ethnicity

which form light of the intuition that when people reveal themselves you know not handy

are honest way they are more

i come below one what are able

we did that this way to report any city and it had a higher rate

as well

for acoustic features we found

a moderate effect size for loudness

in this mode utterances

and this

are examination of the corpus we often found that

like speakers often not excited are when the disclosed in the dialogue like or twenty

it is not something fun suppressing about themselves

the of in spoken lower voice

when they were talking only negative about themselves

so it the variation in pitch was not significantly over the only the loudness

for which will you be found that the four types of gave since my where

significantly more likely to operate in utterances of the let's go to compare two nonsense

words or utterances

with using it partner

which had the highest effect size

from a similar analysis for the listener but a good look at those details in

the paper

of a shared experience we look that affiliation driving time orientation what's one the book

which it was only used by the close to a index commonality with an been

within a given time frame where all that we do we make some kind of

affiliation with the conversation partner

and it wasn't was to affect size for both of them

and

like first wasn't obviously

had a high effective include rapid whatever of and cultivation about that

so the north a visual cues was similar to that of the twist motion

next we look that creates

all systems brain one for

well billboards vision that increases the interlocutor's hundred and perhaps that if a k c

i will have a positive tone of voice is a very intuitive and the war

the a positive effect is what that

we also look at some of the acoustic features here and we had a negative

effect i swear loudness actually so people ls lower when they raise the partner

and of them or at side effect is what will be quality features

finally for source-normalization we looked at different categories of all asked all other things belonging

to social categories or

there was no concerns

and the was present of a class about their

we also

it can capture the intuition that some signals in the language

a puzzle slow modulation

would stem from just putting one student in that you roll but address the problem

you in context where one of the cuban one of the beauty

and the change does

we also do better look at the power drive there was a small it was

significant what the effect i was small

and finally listened via well which has found a in we use you that are

forced wasn't you're

to be an indicator of high stages

and in with user diverse wasn't singular

to be a good predictor of lower stages

we bought and the twins ones do a lot so implementation then we just there

are more likely to make three statements which involve others

so for first wasn't rule

that it was significantly higher in source-normalization utterances what effect it was a small

for acoustic features

we had a positive effect size for the which the fun

the loudness and the vocal quality features

for the visual cues that would say that wanting one additional thing there was significant

in for some additional head-nodding are we not finding the previous conversations are used to

speakers where

a more likely to had not when they were doing a violation of a social

norm

so then using these features or inform

from these days as if they actually

what them in the machine learning model any found

logistic regression to outperform the other basic machine learning algorithms

and b

the accuracy or johns

ranged from sixty to eighty percent for detection of these for a categories

maybe weekly likely just go to the most predictive features which are more interesting than

like accuracy numbers

so in features that one

which is

this people behave in the current one

we found and because that is close they are then to their partner

by gazing at them

and head-nodding pre-emphasized what they're saying

a did not get of their own on the part that's worksheet

and first person singular responsibly predictive however the effect that the machine learning wanna picks

up for first one thing with much less there

competitive model would be to indicating the importance of normal this in

well while doing his conversational strategies

listeners on the other hand respond

during the current done by head-nodding to communicate their attention and giving and the speaker

but not of the worksheet

and in the previous turn

it but there is that's like it is my and not and how well

no or loudness in voice

a four shared experience some of the most pretty if features

included using a their own worksheet like the speaker the less likely to get at

their own watch it all the integrated worksheet that i could have lower schumann voice

a however affiliation driving time in addition what would have only two categories got here

that was really pretty of shared experience

i listeners

on the other hand exhibited be a bit like smiling or have to indicate appreciation

of the content of the tall or anything you could one

also them but that is not a more likely to be elsewhere or at the

speaker why the speaker is doing a shared experience

but we are less likely do not

and b that their own worksheet

and finally in the previous done on the part of the last like to smile

and gaze at their own worksheet

and have a lower loudness in voice

if the partner to the next most one which had experience

operators

the most predictive features because doing a print was giving a the buttons worksheet

which route indicative of directing attention to what the speaker what part is doing well

breathing him

head-nodding with a positive tone of voice

perhaps emphasize the praise

smiling perhaps as an indication of a general appreciation

artemis again the potential embarrassment of race

also

we got features for the listeners

included

head-nodding or back channeling an acknowledgement

and in the previous turn

you partner

was more like use my

and finally for source-normalization we found that the most predictive feature

from the speaker's behaviour in the parent and you're accusing any part or smiling with

my head nodding

and private actually found that

smiling is not only hitting any gettable it's all the time easement

a display of appeasement

and it's signal that you're of attitude at between source normalisation

which is more likely to

probable forgiveness one the interlocutor

thing in the interest of time i just about one or two implications of a

lock

then as well

we identified some regularity the social interaction

and we use might be more because reflectees conversational strategies

a and e that applicable across a wide range of the mean because this mapping

you know more generally also can apply to your to bring as well as

what things like about of the for clinical decisions of words one

and that some of you might have seen yesterday of these findings have been integrated

into always of the system call sara

which takes input in real time

to detect conversational strategies

feed it into the rapport estimated to estimate acquired level of rapport

reasons

about the source light intent

and then generates behavioral all the form of a lot along with the interactions

this time limitations of the work was that we use the valence in it

and we would like to work with a more natural distribution only on the contrary

and deal with this but you could machine learning method which you don't methods

and the other one piece of that when we look at multimodal features instead of

looking at them in isolation better to exploit the dependency of the correlation between different

of each of the temporal contingency so it can look at it people for that

i don't triangles of like these findings to build rapport align you

the finally in conclusion

we learn the discriminating power in general activity appears features

speakers

just are not you results in a shot

speaker is usually accompanied it is crucial

information we had not anything other partner

listeners do not but the about their gaze

also shared experience because the less likely just by and more likely to of or

the gaze

meanwhile listeners my signal in coordination

the so that they were and happens to justine al and a and b

and i think that

and also the what it really could be that would you put this work

we have done for one question

basically

i have in that a question about the term conversational strategy so i know we

use it i've used it my own papers to that i was listening to you

speaker that and thinking gosh it sure implies some kind of conscious intentionality about how

i'm gonna approach the dialogue and it's unlikely that that's really was happening

so i wonder if you when you're colours there's had discussions about what the caller

and what really alternatives to conversational strategy that you might are considered

well i think one of the was things was

like thinking in terms of like

this the first part of speech acts

and

so speech acts

so the different we wanted to start again speech that these in my understanding is

that

on a six sre scan span more than one speech or

and it's

it's more about the illocutionary force of the utterances that morlet pragmatic rather than the

actual

x amount take what the linguistic content

so that one reason for not

right quality that make the move or speech-act what whatever conversational strategy

but what also actually i've seen some work including where you have a taxonomy of

dialog category

and conversational strategy is

is perhaps

using the more complicated within actually we are doing so it it's

it's more it it's more like

something which can be inferred

rather than

like and ready narrative clause level as we are doing what do not is like

Automatic Recognition of Conversational Strategies in the Service of a Socially-Aware Dialog System

Oral Session 6: Conversational phenomena and strategies

Ran Zhao, Tanmay Sinha, Alan Black and Justine Cassell