hi everyone

i am nichols from that an inverse to germany

i'm gonna talk to go about

way to discover user groups for natural language generation in dialogue

and this is work i've done together with crystal spiderman an onyx on the corner

let's see let's look at this example here

we have a navigation system that there's

the user turn right after my central

so user a sexy

in finding the

i think that do

and use of be phase

so why couldn't be

well there are different reasons why

users react differently to such instructions so

most likely here the person is not from the and user is not from melbourne

so

they do not know what maybe one central means but

and we can imagine also other reasons such as the lack

demographics are present a sign or

experience with navigational systems

however such information is often difficult to obtain

so

and

we can ask everyone and before the user navigation system where they from

but it's an interactive setting is something approaching who

and collect observations and react to them so ideally after observing something like that

a system with okay user a using place names from an but

and they want adapt to user b and say something like other on the ball

take the third that's

so people deal with this problem in different ways one approach is of course to

completely ignored

which we don't want

another approach is

to use

one model for every user

however there is requires lots of data for that user and we might lose information

that

might help us from similar users

and another approach would be used pre-defined groups

so for example have

a group of residents of mild one and another group for outsiders

but this is hard to annotate and it's also hard to know in advance

which categories could be rate of and then

which i categories that actually we can actually find inside the and in the dataset

so instead of doing these things

we assume that's the user's behavior clusters

in two

groups that we cannot observe

and

we use bayesian reasoning to infer those groups from the un from an annotated the

training data

and then test time to dynamically assign users those good as the dialogue progresses

so our starting point is a simple log-linear model of a language use

where in particular we have a stack of the way of whether we are doing

and

complete attention like simulating complication or production

so we just in general that we want to predict their behaviour of

and the behavior of view of the user and response the stimulus is coming from

the system so if we trying to simulate language production

the stimulus can be the communicative goal that the user is trying to achieve and

behavior would be the utterance that the use or some other linguistic choice the thing

make

and

if we want to predict what the user would understand

another stimulus is system produce utterance and the behaviour is i mean that the user

signs

the utterance

so this is

this is how our basic model looks like

before we had the user groups

and it's a log-linear model with a real-valued parameter vector o

and set of feature functions fight over behaviors and stimuli

and this model can be trained with a dataset of pairs of the cases in

my using

no longer a gradient descent the based methods

no actually we have already use that thing this work for

events possible resolution in dialogue

so

now if we want to extend this model with user groups

we just assume that there is a finite number of user groups of the data

okay

and the we do you

each of the groups of their own i mean vector

so and we place visionary only the vector from the model before

really is a group specific parameter vectors or if we know exactly what group a

user don't still

and all we have to do is just a replace a just use these new

parameters and

we have like in new prediction model that is get that in particular

however as we still

we want to adapt to user is that we haven't seen in training data

so

we assume that the training data was generated in the following way

we have a set

of users u

and

so it's each user is assigned

to a group

with a probability

you're given by which is another which is another parameter vector that determines the prior

probability of age group

and then

as we said we have one parameter vector for a third group so now the

behavior of the of the user

and not only depends on the stimulus but also on their group assignment and of

the group specific one of the vectors

so now let's suppose that's we have trained our system we don't both training data

and then you user starts talking to us

since we don't know what they're action movies

and we marginalise overall groups using the prior probability

and so we directly have

an idea of what they would do

given a given the prior probabilities that we have observed in the training data and

we can already use this model for interacting with them and then observes a behaviour

so if the user fees

control system gives interacting with a system we start collecting observations for them

so let's say we have

a sets the you of observations for user you of that particular time step

we cannot use these observations to estimate

find out which so you belong still

so we can do that because

as i said we have a specific

the cave you're a prediction

so we can

calculated probability on the right-hand side probability of the data of the observations for the

user given it to the group specific parameters of each clue

and also we have the prior membership probabilities so that is truly we can also

compute

the probability that the user belongs to each of the groups g given the data

and

and there's

so if we plug in this new posterior group membership estimation

in the previous

and behavior prediction model

we have

we have a new

you can prediction model that is aware of that there is a into account

the data but we have seen for this new user and

then you know group membership estimation

and that's we collect more observations from the user

we hopefully have a more accurate group and are suppressed night and a better behavior

addition

now how do we train another system to find the best parameter setting

other set our model has

parameters by which are the prior group of the numbers of phone address and

for each of other groups

has one and

finally the vector for the features

now we assume that we have a corpus of

behaviors instinct line

and for each of these for use of this pair of we haven't seen use

we have we know the use of that use then

but we don't know the groups of young

so we will try to maximize the data likelihood

according to

the previous

behavior probabilities

however we can use or not straightforward to use a gradient descent as for the

basic model because we don't know the group assignments

so instead

we use

a method similar to expectation maximization

so

and in the beginning we just initialize all parameters

randomly from a normal distribution

and then these times that

we compute

the group estimates the group membership probabilities

for given the data for each user

using the parameter setting from the previous time step

and

we use this probabilities

as frequencies for no so the observations

according to that of this distribution

so we have set of sort of separations with

observed

group memberships

so now we can do we can use normal gradient ascent to maximize the lower

part of the of the location given this and observations

and we got we find new parameter setting and

and we

we go back to step one and two they look like it doesn't improve further

and more than a threshold

so now let's see if

if our method works

a if we can discover groups natural and data

so actually our model is a very generic so we can use it in an

component of a that exist and

for which we need to predict the user's behavior

but for the purpose of this work we evaluated in

those specific prediction tasks related to natural language generation

and so the first task

is

taken from the expression generation detection

in this case the stimulus is a visual scene and the target object

and we want to predict

and whether the

user will the speaker will use of spatial relation in describing that object

so for example in this scene if they would say something like that both in

front of the cube or the small global

the dataset we use

is generally three d three

which is a commonly used the dataset in briefings question generation

and it has

at anything described by a sixty three users usage

and relations are using thirty five percent of the scenes

so it is difficult to predict

in this dataset whether the user would you like just from the same it is

it is difficult to predict

whether the speaker will user a spatial relation or not

because some users don't use spatial relations at all

sound use

spatial relations all the time and some are in between

so

we expect that's

our model will capture that

difference

the way we evaluate it is

we firstly we do crossvalidation and with the data in such a way that the

users that we see testing never seen in training before

and we implement two baselines based on the state-of-the-art for this dataset which is work

done by different by one hundred fourteen

so

we see that

are

however the version of our model for one group is actually equivalent with one of

the baselines

which is and basic

and the second baseline also used some demographic data which also the don't

on the help

for improving the data

the f-score of the prediction task

but as soon as we introduce a more than one group

the performance goes up because we are able to actually distinguish between

the different the user behaviors

and this is what happens at test time as we see more and more observations

so we see that for a already after one

after seeing one of the federation our model can is better at predicting what the

user will do next

and the green time is the entropy of the group members

probably distributions so this and this for some throughout the testing phase

so this means that our model our system is a more and more certain about

the actual group that the user

belongs to

the second task which i

is related to comprehension

given the stimulus s which is a visual scene and referring expression

we want to predict the object that so the user understood as a reference

our baseline is based on our previous work from thousand fifteen

where we also use a log-linear model as the one i showed in the beginning

and

for this so experiment we use

as in that paper we use the data from the give two point five challenge

for training and the gift to challenge for testing

however in this dataset

we can thumb achieve an accuracy improvement compared to the baseline

and we observe that the them our model can decide which group to assign the

users two

and

even as we tried different features

we could not detect and the viability of the and

in the data so

we assume that there might be in this case

there the so the user behaviour doesn't actually can we cannot actually class of the

user behavior to

meaningful clusters

and that a test that's however that hypothesis we did the third experiment

where we use the same since but with a one hundred synthetic users

and we artificially introduced a to a completely different use of behaviors in the dataset

so half the user's always select the most are visually salient target and the other

have very salient

and

in this case we did discover that our model can actually distinguish between those two

groups

next we more than one group one and two groups doesn't really improve

the accuracy

and again in the test phase we have the same pictures before so

after a couple of observations are model is

with a certain that look the user belongs to one of the groups

so

somehow

we have shown that we can

cluster users to groups based on the behavior in i data for which we don't

have group annotations

and this time we can dynamically assign announcing uses two groups in the course of

the dialogue

and we can use these assignments to provide a better and better predictions of their

behaviour

and in future work we want to try

different datasets

and applying the same effort to other dialogue-related the prediction tasks

and also

slightly more sophisticated the underlying models

and with this meant for your

yes of course it's very task dependent what the so we only wanted

to predict how the user's plus the depending on that we can ask

yes

as i said so

i'm not sure if i said to what we evaluated on just recorded data so

we didn't have which and the but that's of course very good do when you

have an actual that

well we expected to so in this task

can be honest is an easy task for the for the user right so

if i don't know if you can see if you can read that so it

says press the button to the right of the land so most users get it

right

so but there is a sound fifteen percent of errors

so we will

we call to find about some he didn't bother and but

like why some users

it sounds uses for example have difficulty with colours

or with a spatial relations

well

we didn't

yes it's probably

so for the for the production task

yes so we didn't

so for this task studied in the literature says that

there are basically two clearly distinguishable groups

and some people are in between

so this is my this might be why we have like a slight improvement for

six or seven

groups like

maybe by we have

when we have a six or seven groups we have like

groups that happened to a captures some particular usersbehaviour but which have very low prior

probability

but we do find the main two groups with the groups which are

whether i people who always use relations and

you don't

you mean to look at a particular feature weights

yes we did so i that we didn't look at that i don't remember exactly

what we found out but we

we did find out that there are like

and some particular features which

which have a completely different ways to use

that i don't remember which one

which one