okay so hello i'm these are processed and from one university already introduced
thank you for having
and i'm going to talk about changing the level of directions on the dialogue
so when i first had that's have a little motivation of why this could be
useful
if we look at human dialogue for example one person could say you want to
you to sell it repeats not and for some reason the other person decides not
to answer that question directly and test i prefer warm il
i want you human dialogue we can easily say okay that's itself
and then the person could
shoes to be more polite did not say directly you should really go on a
diet
and just say that pizza has a lot of countries
and then the other person
it's not offended and can say okay take the summit
so if we have a look at the same conversation with them
dialogue system which is not equipped to handle in directness
we can run into a number of problems
so
for example if the system says a do you want to excel at a pizza
and the human says i'd rather have a one meal
if the system is not equipped to handle this indirect this and just expects a
direct translate won't understand that
and then of course is to repeat the question and the user has to more
directly state-of-the-art sir
which is not that bad but could be handled better prices
but system if it could understand this indirect version of the answer
and another problem we have that is in the output because sometimes as humans we
expect our conversation partner to not really be direct
so if the system not chooses to be directed and say you should not itself
and the human will be very angry
so it would be better or if the system could handle in directness well on
the inputs and i and on the output side
and that is why
the goal of my work is changing the level of directness of an utterance
now i want to have a look at the algorithm a whole want to do
that
at first i will give an overview of the overall algorithm and then to address
some challenges specifically
so my algorithm works with the three different types of input from the current utterance
and the previous utterance and the double
and a pool of utterances that it can choose from to exchange the current utterance
the next step then is to evaluate the directness level of those utterances
and from that we get of course the directions of the current utterance
and the directions of every utterance and we need the previous utterance
because the directness is of course depending on what was said before
and we can have different levels of in directness depending on the previous utterance
and the next step then is
to filter all the utterances so that we only you have the pool of utterances
we can choose from
it have the opposite directions of the current utterance
and the last step we have to see
which of those utterances is the most similar in a functional manner to the current
utterance
which then leaves us with the utterance we can exchange it for
so two challenges in this algorithm one is
the directness level how can we estimate that
and the other one is
how do we assume which one which utterances are functionally similar
so that start with that for is what is functionally similar
i define that as the degree to which two utterances can be used interchangeably in
the dialogue so they fulfill the same function in the dialogue
and
as a measure of course functional similarity i decided to do that with a dialogue
act models
they are inspired by work spectrum models so they follow the same principle
and that
utterances in the back for space in a manner
the utterances appearing in the same context are mapped controls vicinity to each other so
if two utterances
are used in the same context
it's very likely that they
can be exchanging in the same the
the median distance and the spectral space is then used as an approximation of the
functional similarity
i'm pretty sure that works because i have already published paper it outright this year
and i will quickly summarise the findings of the paper so you can see why
this is good feet
i have evaluated the accuracy of clusters
then i have hard k-means
in the dialogue vector space and compare them
to the ground truth of clusters by hand annotated dialogue acts
so want to see of improving in the dialogue vector space corresponds to the annotated
dialogue acts
and i didn't cross corpus evaluation so on the dialogue act the models are trained
on a different corpus then the clustering was performed on and just you can see
on the left side the risks the accuracy is very good
and
that's why i think
at a dialogue act models work very well for the estimation functional similar utterances
so that's get to the estimation of directors which was the second challenge
you can already see an architecture here this is for a recurrent neural network is
to make the directness with the supervised learning approach
and as an input is used the sum of weight vectors on the one
so every work in the sector a in the an utterance
we use the word vector and just
at all of them
and also i use than the dialogue vector representation as an input
and the suspect it's a reference so we have a twenty data connection that also
that's just get
previous an utterance or and the input of the previous utterance
its output
we have i've made as a classification problem
so the output just the probability of the utterance being either a very direct so
i wanted to drink and so for example
slightly indirect you we have can i get a ticket trajectory itself which is not
quite the same but still has all the main works in there that are met
necessary for the meeting
and then very indirect where you just say i don't like meat
and hopefully the other person can get that
so this has not been tested before so as part of the evaluation for this
work
i also evaluated the how well the estimation of directors with this approach works
so and with that let's get to the evaluation
so as a set on the one hand the accuracy of the direct estimation was
evaluated
and of course the accuracy of the actual utterance exchange also
and for that on we of course the ground truth that means we need
a dialogue corpus that contains utterances but we can exchange
we need to of course and annotation of the directness level
and an annotation of dialogue act and in order to see if we made a
correct exchange
it was
impossible to find holes like that
so i also wasn't sure we could actually
get a corpus like that ourselves because it's very difficult
do not inhibited the naturalness of conversation
well still the same to the participants
okay we need to this meeting in different phrases a different directness levels
to make sure that there are external equivalent utterances in the corpus
so for this i decided to do an automatically generated corpus to want to present
now
so that it calls contained
the definition of the dialog domain with system and user actions
and just accession rules under which set which system
what which action could for the which other actually
each action had
multiple utterances that actually a to a used were great
and of course of directors level depending on the previous utterance
then we started with the beginning with the start actually
and then just record simply cut all the successors
but also selsa's again until we reach the end
and thereby generated all the dialogue flows that where possible with the time domain which
no and we defined
and the weighting was then choosing randomly and this resulted in more than four hundred
thousand dialogue flows
and
about or for working is very dialogue act she
for example you can see here yes could be worried it is a great i'm
going forward to it or that sounds to the shoes
if anyone
what the previous utterance was
or i would like to order pizza can order pizza from you
the topic of those conversational style story
so for example ordering a pizza or arranging for joint coding together
and it
i try to incorporate
many elements of human conversation
so it for example i had over a string
the one misunderstandings a request for confirmation corrections and things like that
and as already mentioned context-dependent directness levels
so for example and you have time today
can be answered with i have planned it is in
which is not a direct answer so it hasn't directors three
and it finds today i haven't planned anything
so here we have
different a question and before that
so this time it's a direct answer and achieves the directness the number of one
so of course with an automatically generated out or was there are some limitations
we have that's variation in a natural conversations of course
and well
with regard to the dialogue flow
and to the weighting that here
and that very likely means
it's
more predictable and therefore easier to or
however i also see some advantages of this approach
on the one hand we have a very controlled environment
we can make sure that
for example is an actual on server
in the corpus utterances
so we know that there is a valid exchange and if we didn't find that
the for the our algorithm and not just that there is no
correct utterance in the corpus
and also
we know
because
the corpus was
was not annotated but generates the ground truth
at this ground truth is very dependable
and also i think it's an advantage that using this approach we have
a very complete dataset we have all the possible flows we have
many different weightings
and i think that having this for small application
can meet implications for if we actually have a lot of data and the approach
the full coverage so
for example usually if i could just collect dialogue i want
one have a lot of data and i won't this poor coverage
but a larger companies may but i just don't get the data and that we
do this
small
small what complete set that we generated
that can
then have some implications for what if i could get that
i at all
so for our results this means is of course
that
they don't do not represent the actual performance in the applied for spoken dialogue system
which test we don't have natural conversations
so it's very likely that it will perform worse
but we can replace potential or approach given ideal circumstances
so i think it still or some an average rate
so with that that's get to the actual results
at first the accuracy of the directness estimation
here we use them as input for the
a dialogue vector model
it was trained on our automatically generated calls
and we used word actual models that were trained on the google news call and
you can see the reference for that
as dependent variable of course we have the accuracy of correctly predicting the level of
directness as annotated
and it is indeed and river
independent variables we use
versions where we
with and without where actors as input to see if we improve use all
and also
we wanted to see
if the whole the size of the training sets impacts the classifier
so we used of course ten fold cross validation as usual which leads to a
training corpus of ninety percent of the data
and we also tested it with when we only use ten percent of the data
also we use different other tactile models there we also used different
sure
sizes of the dialogue corpus that we generated how many of the dialogs we included
the actual train
and he can see the results
so
we could achieve a very high accuracy of darkness estimation
but keep in mind it's an automatically generated corpus so that plays the role in
that of course
the baseline for the majority class prediction would have been zero point five to nine
one
and can clearly outperform that
we can see a significant influence of both the size of the training set
and this is whether or not we include the word vectors
and
i think then
that the word vector as input to improve so much of the estimation results really
speaks of the quality of those models that what we have the speaker data
but i think extensive work come on six is
so that should not you problem
what could be a problem is and the size of the training set
because this is annotated data it's a supervised approach
so
if we want
choose a scale this approach
we would need a lot of annotated data so perhaps in the future we could
consider i'm
unsupervised approach for this
that doesn't need to a lot of annotated data
so the accuracy of i utterance exchange for the functional similarity we again used
the dialogue act models from the automatically generated calls
and for that are just estimation we use different portions of the train crash classifier
that are just presented
and as dependent variable and the percentage of correctly exchange utterances
and independent variables here where the classifier accuracy
and again the size of the training corpus for the dialogue act models
g you can see the results
the best performance we could achieve overall was zero point seven
percent of utterances that were correctly exchange
and we have a significant influence of both a classifier
accuracy
and that the size of the training data for the dialogue act models
and a common error that could see
it was made by the algorithm
was that the utterance exchange was done
either with more or less information than the original utterance
so for example and stuff i want something spicy
it was exchanged with i want a large pepperoni pizza and large of course is
not included in the first sentence
so
this points to that a dialogue act models as we trained and cannot really differentiate
that well between those a ring in
but this could be solved with just adding more context to them so
during the training take into account more utterances in the vicinity
we can see here the importance of a good classifier and a good similarity measure
the similarity measure i don't that's a problem because
it's on annotated data so we can just take large corpora of dialog data and
use that
again the annotated data is here the real challenge and
we should consider the unsupervised approach
a short discussion of the results
i think the approach shows a high potential what the evaluation was done in a
theoretical setting
and we have not applied to an full dialogue system
and therefore they are still some questions to be answered
so in this corpus we have this variability and in a natural dialogue
that means that
very likely the performance of the classifier and style vector model will decrease in an
actual dialog
to compensate for that we then we need more data
and we have the problem that
we don't really know if in an actual dialog hope was suitable alternative to exchange
actually exist
again if we have an increasing amount of data it becomes more likely
what was and it's not sure
so perhaps as a future work we can look into the generation of utterances instead
of just their exchange
and i dunno point is the interpolation of user experience and the accuracy of exchange
because at the moment we don't know what
actually we see we actually need to achieve
to improve the user experience
so that is also something we should look into
so
that system
the end of my talk i want to conclude what i presented to you today
i discuss the impact of interest in human computer interaction
and propose an approach to changing that both directions of matter
the directness estimation is done using recurrent neural networks the functionality measure
uses dialogue act models
and the evaluation shows the high potential this approach is also a lot of future
work should you
it would be good to have a corpus of natural dialogues annotated with the director's
that to use that as an evaluation
there would be benefits of to an unsupervised estimation of the directness level
and
also an evaluation on an actual dialog corpus
would give more insights on how that actually impacts the performance
and the generation of suitable utterances would be just desirable because we don't actually know
if an
the right utterances in the corpus
and finally of course we would like to a okay apply this to an actual
for dialogue system
thank you very much for your attention
no i did not evaluate and set
yes a lot of my somewhere in this regard for differences there the directness is
a very major difference that exists between cultures so therefore the source of major interest
for me
yes
i think it would you really good
such a coarse and
i'm thinking about ways like i think one of the main difficulty is there is
as i said i'm coming from an actual difference
so for example i would expect a german i will be even more direct and
for example japanese
then we have the translation problem we can't exchange german utterances for japanese utterances so
that makes it difficult and i'm not sure how to ensure for example in a
german that the participants were actually use a in alright version
the direct utterances as well
so there is a little bit of a problem
that sounds interesting thank you very much
so this was small part of the error rate
and there are just use the k-means clustering algorithms
to find clusters
in this work i don't actually define clusters but just use the closest one
no i used it is so basic i
it
it's director pretty of is
if it's a colloquial re-formulate dislike you know
and i you know works from the original sentence here in the exchange sentence then
it's a here and very direct