so hello again i names sebastian i'm a phd student from then
and i am going to present now the paper with the title redundancy localisation for
the conversational iteration of unstructured responses
this is joint work with three other people might eric and ninety eight i don't
danielle a burger whose eric is a phd student in finland
and just proposed on mobile arrogant hire interning in zurich
so on
obviously d a general topic of this were is on dialogs and more specifically we
were working on a setting where a user engages in a conversation of the conversational
assistant such as the goal is then or series
that runs on a voice activated speaker
so we do not have any kind of display to convey a information we need
you a read out everything
basically all the information and we want to convey
this is important because it means that at least initially in a dialogue we should
into a you we aim to you give
concise answers and only after the user has confirmed interest in the kind of information
that we
did we get
then we can also say well basically a longer utterances
okay
now in the past there was a work on very different types of dialogues for
example a task oriented dialogue such as restaurant booking or chat boards where users engaging
chitchat
we have nowadays on the role
dialogue models
what does not what we need to re rather focus on something that could be
called informational dialogues
so this in this type of dialogue is users have and
information need that we try to satisfy
so for example a user might ask something like
what is an average
and
we
so in the setting where in the pipeline that we used
we follow the very shallow approach that is we just to the question
we for what it to a background question answering component and
information retrieval component
the cables a bunch of response candidates
that we could choose one
and as i mentioned initially we of selected the short answer such as one displayed
here
so that well that malaria is a disease cost by up to a small in
paris i'd transmitted by divide of in fact of mosquitoes
right there are many different options to implement such a question answering component but again
for over this we retreated as a black box
okay and what
so you focus of our work was on the problem that occurs when this
dialogue continues so
let's assume that the user all likes to kind of information that we give data
that is we are correctly understood what the user get the kind of information that
uses looking for
and the user says something like a tell me more or maybe the user issues
a follow-up responses
and then we would again go to the a question answering component and this time
you would select a longer response and what read that out in a whole that
this longer response contain some additional information that is of interest
to the user
another problem is the following this longer response o where many instances these longer responses
or partially
redundant with respect to what we have just said a like thirty seconds earlier so
in this particular example here d part and the lines
and i like that in red color is are redundant so it again mentions that
that miller a and areas cost by a parasite and that it is transmitted by
a particular kind of mosquito
so well again to sounds are redundant this is not a response that the user
would like to hear so we need to do something about this problem
and there are a two aspects to this research problem
the first aspect is that we need to understand it when and where a response
is redundant with respect to the dialogue context
that is we needed to localise redundant content in pairs of short text
so individualization here on the slide we have two rows of boxes the top role
so each box
it's suppose correspond to berkeley top row of boxes are corresponds to the first
short response the bottom row what's is the supposed to of all visualize the longer
follow-up responses
and our task is to basically selected boxes and bottom row that are redundant that
occur again
right and once we have this information then the next step is to adapt a
follow-up response to the previous dialogue context that is to discard repetitive content
okay there are many well there are few related phenomena in literature such as
task of recognizing textual entailment or the task of
semantic textual similarity both of which are deal with determining the
coarse grained relation between you short text
yes also something called interpretable semantic textual similarity which goes a little bit deeper into
d similarity of text by requiring a task participants to also provide a fine grained
alignment of the channels in that you text
so that means that was
lots of an inspiration in terms of existing models that we could
built upon an overall problem really was how
could be get our hands on data with fine grained redundancy annotation that would allow
us to train model and model in that and that problem
well one approach to get a does of course to and manually annotated but that
is going to be expensive so
what we did this we came up with the weight all
defining a we training signal and the idea for this we training signal is the
following
so
for a given question as i mentioned earlier this i are black box this question
answering black box
used as a bunch of response can be that's and associated with each response there
is a confidence score details you from the perspective of system
of the question answering system how
well this response candidate answers to use a question now with regard to the user
question it is quite likely that too high scoring to hiring answer candidates
are paraphrases while if you compare the top
ranking and search one from falls down below
this response candidate list these two will probably only share some information so terrible i
would be some information missing and the lower ranked answer or whatever be
some additional information
now i in order to build this retraining signal what we did a sweeping three
these three answers so that you've
from top of the result list and one from for the down
you'll is repair to do you top ranking ones
and prepared to top ranking one with one from further down the result list
with that each peer two d model and drawl model and head model
produce a coarse grained similarity score for each of the pairs
and then we define the drinking objective that is shown on the slide that is
we
push the model towards the signing higher
redundancy score higher for screen similarities for two d to a
top ranking answer candidates
and you hope was that if we gave the model appropriate capacity and appropriate structure
then it would
in order to produce this coarse grained similar to score it would learn how to
line and compare the constituents of
responses
no this slide you can see an example triple
so this is this is the kind of
data to report but you're on the slide all the three responses that you can
see are only a one sentence long but in reality work for a passage level
data
so they were like to three or four sentences per response
and the multi coloured boxes in this example are supposed to indicate the basic semantic
building blocks of these responses and as you can see d
first through it first answers have which are did you
highest ranking answer is added are returned for particular query
from this question answering component
share for
for semantic building blocks for the first and the third answer on the share
like half of the of the semantic content
right so we build a dataset of one point five million such response triples and
we use that you at training
a model that we that
that was of all the development of an already existing and already published model for
recognizing textual entailment
and you model is essentially a three component feed forward neural network
and which means that was really fast to train which was a good thing since
we had so much data to process
okay now let's take a creek
high level
you on the on the model
so the input two or more number two response two responses
under that inside your of the slide and on the right hand side of the
slide you can see the we can see the output of the model which
was a coarse grained a high-level similarity score for these two responses
now in the first component de model
should use produced an alignment of the two responses that is
it on
should use a custom representation of the first response
for each token of these second response
then in the in the second component
the these are custom representations were compared
so you
so this custom representation the first
response were compared to each token of the second response which give us a local
redundancy score so token level redundancy scores for the second answer
and then in the first component
these
local redundancy scores were aggregated in order to produce this
coarse grained this high-level redundancy school
okay
so this is how the training word and
now its application time and inference time we weren't really interested in the is a
coarse grained similarity score so what we did after model training we
basically chop of that part of the of the of the model
and we additionally that these system as input a given
segmentation of the
second response into phrase it's
then we aggregated the redundancy scores the local redundancy scores
for each segment for each phrase and that a bus
phrase level redundancy scores for d phrases in you second response
okay
so we carried out
to
a twofold evaluation so in the first aspect of the evaluation we are concentrated on
looking into d capability of our model to actually localise redundancy
so what we did as we propose a held-out passages from our training data
so here you can see an example pairs so we have to first response passageway
which is a relatively short and then we have a longer
for what has such
we did not change the first response what we automatically segmented
these second response
and then we showed up of the first response and the second response to raters
and asked raters to assign
any redundancy label
to each of the segments of the second
response
now in this dataset there are a one thousand two hundred a passage pairs with
fine grained redundancy annotation we use this data set you
to what you at all model the dataset is released on get top
and right so and we ran our model on this dataset and we compared to
its capability of localising redundancy
against the
original model for recognizing entailment
and discourse that you concede you're on the slide hours you're men correlation values of
the well so the correlation of the predicted redundancy with the a rate are assigned
redundancy
as you can see all model was
outperforming dtd baseline model
what you can see on the right hand side is that's
is the is the scatter plot of our models
segment level phrase level redundancy scores
plotted against the goal of the redundancy score of directors
well you can see you can see two things and this has gotta but first
there is a clear correlation between the two kinds of scores
and second you can also see that the absolute
redundancy scores that our model produces for each segment are a bit hard to interpret
so these are not really useful what it is
indeed useful is the ranking of
so do redundancy ranking of segments that is induced by d's
score so the ranking of
segments inside a passage so you cannot use discourse to compare the redundancy of
segments across data
data examples which can use it to rank you of redundancy
a drank the passages
in the past
during the segments in the passage according to their
redundancy and just ranking is what we what we used in the
second experiment
so in the second experiment
we looked into you and d
in fact that this model can have on the quality of the
of the kind of dialogues that i showed only also and to on the on
these informational dialogues
so what we did this we
show two raters first the initial response dinitial short response we also show than the
original follow-up responses
and we for the more also show them in a compressed
follow-up responses that was compressed using the redundancy scores of from all model
so what we did here is we
we follow the relatively simple strategy for a compressing the passage that is we worked
here in this experiment
which is really just the preliminary experiment kind of a kind of a pilot study
so we work with sentence-level segments and we just discarded the
sentence that was most redundant according to all mortal and then we ask raters which
variant of the to follow up which variant of the followup response data like more
the original one what you compress one
so as i mentioned this is only a preliminary experiments all there was a really
small sample size
and we
we compared our model here against two simple baselines one most always dropped first sentence
of t for a response
and the other one was to drop the sentence which had the highest lexical level
over that
with t first
so would a result of this experiment was that users
like i'll decompression space and all a model more in terms of the naturalness of
of the produced compressed passage so that is a good thing
but again that was only a
quite small sample size
on the downside the there was a slight informativeness loss
of the
compressed
a follow-up passage and all model performance
versus on this informativeness
metric compared to the two baselines
this might not be a very big deal since i mean
to certain degree you do expect some loss of informativeness in compressed passages
and naturalness is really the key thing to look for here
okay that's it already so the slide summarises the contributions of all work
so we
describe a particular problem in the space of textual similarity that you face when you're
a bit informational dialogues
we created a evaluation data for this problem
we release the state we
propose both a model to localise redundancy
and also a way to train this model in weakly supervised play
and maybe d
take a message from this from the store them from all work is that
also having
right well also following a relatively shallow approach do you dialogues combined with a relatively
simple
and the role model can already well
give you white
so the performance and can already you know
improve over the over the original
thanks that's it
maybe i thought that was one question myself this is with respect to i am
loss of informativeness i mean in the example that you showed for instance
the and exact type of the most usual for instance was one of the things
that work it out a rodent all you know because it was in the sentence
that was
generally redundant right except for the same the
right and with this
approach that you've chosen where you basically about phrases which are mostly redundant or which
sport i and redundancy this information just gets lost right where's you know i was
wondering whether you could comment more
right "'cause" i imagine that quite common right
right so that that's exactly the point of this informativeness loss
so one could argue that so as i said naturalness as much more important than
informativeness
that is
having a natural sounding a answer that might not give you give a user that
information about the particular kind of mosquito
is a small the thing that we that we looked for and maybe one thing
to look into in the future is
how this
localised redundancy can be exploited in the compression stage in a more you know in
the in a more sophisticated way so that this
informativeness losses remedy for example
so in that
and that particular example that you mention that the particular kind of mosquito which is
no longer mentioned in which we just discard
say a sophisticated sentence compression model and might be able to you
process as input token level redundancy scores and could then maybe produce a produce a
sentence pair this information about the particular kind of was keyed also the name of
the of the of the family of the mosquito the type of the mosquitoes still
mentioned but we all the one most of the redundant content is left out
so that might something to look into
thanks for very nice talk
curious about them to generalize
sation of the method to pass the second turn of the dialog gonna tell me
more context and so this may be general question that might
foreshadow some of the discussion that we could have in the in the panel session
where one of the questions for the panel with just a dialogue system actually have
to know what is talking about
in this context we just have string
representation and there could be lots of natural follow ons from that tell me more
utterance like
you know what's related to that is possible at least mosquito or you know
right there reading cycle or some you know that could be anaphora and things up
like that and they can be really interesting if there was a really principled way
that we could combine search result somehow into the discourse context
so that we were just doing query rewrite in we could actually kind of have
a more integrated method and what do if you have any thoughts about
very good point so i mean how would you so with all a model of
how could one handled is multiturn dialogue so more than morning to turn say three
turns for turns
well
the unique approach would be to just
basically concatenate all the three views a dialogue context into the first response and if
this to the model and then
always use as these second response fed to the model what you
would like to say next
of course this
only works for so long and forty four for so many dialog turns
i agree that in order to you well that would be a good idea to
have a
maybe a deeper modeling of compare conversations and of conversational structure in order to
you move this redundancy problem in the more
general way
right so
a related aspect or one thing that we are stumbled across is that
i mean of course redundancy is not always a bad thing to do you need
a certain amount of redundancy that you need to keep if you would just
remove redundancy from a follow-up responses
you might face the problem that the user actually wonders if you are still talking
about the same thing
and
that's also something to consider so nh would be the right thing to come up
with more general model of a redundancy and dialogue and
that would give you on
toolkit on how to handle or kind of that kinds of redundancy in good kinds
of redundancy that you shouldn't remove or that you may be should actively seek to
include in to your a response
thank you very interesting works thank you
what i was wondering about goes into the directional you know how you maintain coherence
dialog and specifically you said something about
how to select the friend
you know non-overlapping phrases
and then you select the menu stick them back to get my question is
maybe i misunderstood but by questions how did you actually then do the generation process
how to do you then you know make sure it's a well-formed stream with
coherence meaning you know an for us and so on
right
so we so what we did was really
employing a very shallow approach so we did not
generate you responses paltry
so i mean i said earlier that i'm there are many different ways to implement
is this question answering black box and one
one thing to do is maybe to employ a passage retrieval that is
so you take the user question and you compare it against a huge collection of
text and you
you try to find the passage in the text
that
possibly answers this question and in the shallow approach you wouldn't really try to understand
what is india what is in this passage you would just try to search and
if it
answers the question or not and didn't we would rely on the
basically on the on the
the mentality of the source text
that we use so we did not do any kind of
are two generation user didn't change let's i dunno rows and with just a basic
different set
exactly yes handling enough for an in a more sophisticated way that's also
probably something that a bunch of look into a in the future