Speech Transcript - Redundancy Localization for the Conversationalization of Unstructured Responses

so hello again i names sebastian i'm a phd student from then

and i am going to present now the paper with the title redundancy localisation for

the conversational iteration of unstructured responses

this is joint work with three other people might eric and ninety eight i don't

danielle a burger whose eric is a phd student in finland

and just proposed on mobile arrogant hire interning in zurich

so on

obviously d a general topic of this were is on dialogs and more specifically we

were working on a setting where a user engages in a conversation of the conversational

assistant such as the goal is then or series

that runs on a voice activated speaker

so we do not have any kind of display to convey a information we need

you a read out everything

basically all the information and we want to convey

this is important because it means that at least initially in a dialogue we should

into a you we aim to you give

concise answers and only after the user has confirmed interest in the kind of information

that we

did we get

then we can also say well basically a longer utterances

okay

now in the past there was a work on very different types of dialogues for

example a task oriented dialogue such as restaurant booking or chat boards where users engaging

chitchat

we have nowadays on the role

dialogue models

what does not what we need to re rather focus on something that could be

called informational dialogues

so this in this type of dialogue is users have and

information need that we try to satisfy

so for example a user might ask something like

what is an average

and

so in the setting where in the pipeline that we used

we follow the very shallow approach that is we just to the question

we for what it to a background question answering component and

information retrieval component

the cables a bunch of response candidates

that we could choose one

and as i mentioned initially we of selected the short answer such as one displayed

here

so that well that malaria is a disease cost by up to a small in

paris i'd transmitted by divide of in fact of mosquitoes

right there are many different options to implement such a question answering component but again

for over this we retreated as a black box

okay and what

so you focus of our work was on the problem that occurs when this

dialogue continues so

let's assume that the user all likes to kind of information that we give data

that is we are correctly understood what the user get the kind of information that

uses looking for

and the user says something like a tell me more or maybe the user issues

a follow-up responses

and then we would again go to the a question answering component and this time

you would select a longer response and what read that out in a whole that

this longer response contain some additional information that is of interest

to the user

another problem is the following this longer response o where many instances these longer responses

or partially

redundant with respect to what we have just said a like thirty seconds earlier so

in this particular example here d part and the lines

and i like that in red color is are redundant so it again mentions that

that miller a and areas cost by a parasite and that it is transmitted by

a particular kind of mosquito

so well again to sounds are redundant this is not a response that the user

would like to hear so we need to do something about this problem

and there are a two aspects to this research problem

the first aspect is that we need to understand it when and where a response

is redundant with respect to the dialogue context

that is we needed to localise redundant content in pairs of short text

so individualization here on the slide we have two rows of boxes the top role

so each box

it's suppose correspond to berkeley top row of boxes are corresponds to the first

short response the bottom row what's is the supposed to of all visualize the longer

follow-up responses

and our task is to basically selected boxes and bottom row that are redundant that

occur again

right and once we have this information then the next step is to adapt a

follow-up response to the previous dialogue context that is to discard repetitive content

okay there are many well there are few related phenomena in literature such as

task of recognizing textual entailment or the task of

semantic textual similarity both of which are deal with determining the

coarse grained relation between you short text

yes also something called interpretable semantic textual similarity which goes a little bit deeper into

d similarity of text by requiring a task participants to also provide a fine grained

alignment of the channels in that you text

so that means that was

lots of an inspiration in terms of existing models that we could

built upon an overall problem really was how

could be get our hands on data with fine grained redundancy annotation that would allow

us to train model and model in that and that problem

well one approach to get a does of course to and manually annotated but that

is going to be expensive so

what we did this we came up with the weight all

defining a we training signal and the idea for this we training signal is the

following

for a given question as i mentioned earlier this i are black box this question

answering black box

used as a bunch of response can be that's and associated with each response there

is a confidence score details you from the perspective of system

of the question answering system how

well this response candidate answers to use a question now with regard to the user

question it is quite likely that too high scoring to hiring answer candidates

are paraphrases while if you compare the top

ranking and search one from falls down below

this response candidate list these two will probably only share some information so terrible i

would be some information missing and the lower ranked answer or whatever be

some additional information

now i in order to build this retraining signal what we did a sweeping three

these three answers so that you've

from top of the result list and one from for the down

you'll is repair to do you top ranking ones

and prepared to top ranking one with one from further down the result list

with that each peer two d model and drawl model and head model

produce a coarse grained similarity score for each of the pairs

and then we define the drinking objective that is shown on the slide that is

push the model towards the signing higher

redundancy score higher for screen similarities for two d to a

top ranking answer candidates

and you hope was that if we gave the model appropriate capacity and appropriate structure

then it would

in order to produce this coarse grained similar to score it would learn how to

line and compare the constituents of

responses

no this slide you can see an example triple

so this is this is the kind of

data to report but you're on the slide all the three responses that you can

see are only a one sentence long but in reality work for a passage level

data

so they were like to three or four sentences per response

and the multi coloured boxes in this example are supposed to indicate the basic semantic

building blocks of these responses and as you can see d

first through it first answers have which are did you

highest ranking answer is added are returned for particular query

from this question answering component

share for

for semantic building blocks for the first and the third answer on the share

like half of the of the semantic content

right so we build a dataset of one point five million such response triples and

we use that you at training

a model that we that

that was of all the development of an already existing and already published model for

recognizing textual entailment

and you model is essentially a three component feed forward neural network

and which means that was really fast to train which was a good thing since

we had so much data to process

okay now let's take a creek

high level

you on the on the model

so the input two or more number two response two responses

under that inside your of the slide and on the right hand side of the

slide you can see the we can see the output of the model which

was a coarse grained a high-level similarity score for these two responses

now in the first component de model

should use produced an alignment of the two responses that is

it on

should use a custom representation of the first response

for each token of these second response

then in the in the second component

the these are custom representations were compared

so you

so this custom representation the first

response were compared to each token of the second response which give us a local

redundancy score so token level redundancy scores for the second answer

and then in the first component

these

local redundancy scores were aggregated in order to produce this

coarse grained this high-level redundancy school

okay

so this is how the training word and

now its application time and inference time we weren't really interested in the is a

coarse grained similarity score so what we did after model training we

basically chop of that part of the of the of the model

and we additionally that these system as input a given

segmentation of the

second response into phrase it's

then we aggregated the redundancy scores the local redundancy scores

for each segment for each phrase and that a bus

phrase level redundancy scores for d phrases in you second response

okay

so we carried out

a twofold evaluation so in the first aspect of the evaluation we are concentrated on

looking into d capability of our model to actually localise redundancy

so what we did as we propose a held-out passages from our training data

so here you can see an example pairs so we have to first response passageway

which is a relatively short and then we have a longer

for what has such

we did not change the first response what we automatically segmented

these second response

and then we showed up of the first response and the second response to raters

and asked raters to assign

any redundancy label

to each of the segments of the second

response

now in this dataset there are a one thousand two hundred a passage pairs with

fine grained redundancy annotation we use this data set you

to what you at all model the dataset is released on get top

and right so and we ran our model on this dataset and we compared to

its capability of localising redundancy

against the

original model for recognizing entailment

and discourse that you concede you're on the slide hours you're men correlation values of

the well so the correlation of the predicted redundancy with the a rate are assigned

redundancy

as you can see all model was

outperforming dtd baseline model

what you can see on the right hand side is that's

is the is the scatter plot of our models

segment level phrase level redundancy scores

plotted against the goal of the redundancy score of directors

well you can see you can see two things and this has gotta but first

there is a clear correlation between the two kinds of scores

and second you can also see that the absolute

redundancy scores that our model produces for each segment are a bit hard to interpret

so these are not really useful what it is

indeed useful is the ranking of

so do redundancy ranking of segments that is induced by d's

score so the ranking of

segments inside a passage so you cannot use discourse to compare the redundancy of

segments across data

data examples which can use it to rank you of redundancy

a drank the passages

in the past

during the segments in the passage according to their

redundancy and just ranking is what we what we used in the

second experiment

so in the second experiment

we looked into you and d

in fact that this model can have on the quality of the

of the kind of dialogues that i showed only also and to on the on

these informational dialogues

so what we did this we

show two raters first the initial response dinitial short response we also show than the

original follow-up responses

and we for the more also show them in a compressed

follow-up responses that was compressed using the redundancy scores of from all model

so what we did here is we

we follow the relatively simple strategy for a compressing the passage that is we worked

here in this experiment

which is really just the preliminary experiment kind of a kind of a pilot study

so we work with sentence-level segments and we just discarded the

sentence that was most redundant according to all mortal and then we ask raters which

variant of the to follow up which variant of the followup response data like more

the original one what you compress one

so as i mentioned this is only a preliminary experiments all there was a really

small sample size

and we

we compared our model here against two simple baselines one most always dropped first sentence

of t for a response

and the other one was to drop the sentence which had the highest lexical level

over that

with t first

so would a result of this experiment was that users

like i'll decompression space and all a model more in terms of the naturalness of

of the produced compressed passage so that is a good thing

but again that was only a

quite small sample size

on the downside the there was a slight informativeness loss

of the

compressed

a follow-up passage and all model performance

versus on this informativeness

metric compared to the two baselines

this might not be a very big deal since i mean

to certain degree you do expect some loss of informativeness in compressed passages

and naturalness is really the key thing to look for here

okay that's it already so the slide summarises the contributions of all work

so we

describe a particular problem in the space of textual similarity that you face when you're

a bit informational dialogues

we created a evaluation data for this problem

we release the state we

propose both a model to localise redundancy

and also a way to train this model in weakly supervised play

and maybe d

take a message from this from the store them from all work is that

also having

right well also following a relatively shallow approach do you dialogues combined with a relatively

simple

and the role model can already well

give you white

so the performance and can already you know

improve over the over the original

thanks that's it

maybe i thought that was one question myself this is with respect to i am

loss of informativeness i mean in the example that you showed for instance

the and exact type of the most usual for instance was one of the things

that work it out a rodent all you know because it was in the sentence

that was

generally redundant right except for the same the

right and with this

approach that you've chosen where you basically about phrases which are mostly redundant or which

sport i and redundancy this information just gets lost right where's you know i was

wondering whether you could comment more

right "'cause" i imagine that quite common right

right so that that's exactly the point of this informativeness loss

so one could argue that so as i said naturalness as much more important than

informativeness

that is

having a natural sounding a answer that might not give you give a user that

information about the particular kind of mosquito

is a small the thing that we that we looked for and maybe one thing

to look into in the future is

how this

localised redundancy can be exploited in the compression stage in a more you know in

the in a more sophisticated way so that this

informativeness losses remedy for example

so in that

and that particular example that you mention that the particular kind of mosquito which is

no longer mentioned in which we just discard

say a sophisticated sentence compression model and might be able to you

process as input token level redundancy scores and could then maybe produce a produce a

sentence pair this information about the particular kind of was keyed also the name of

the of the of the family of the mosquito the type of the mosquitoes still

mentioned but we all the one most of the redundant content is left out

so that might something to look into

thanks for very nice talk

curious about them to generalize

sation of the method to pass the second turn of the dialog gonna tell me

more context and so this may be general question that might

foreshadow some of the discussion that we could have in the in the panel session

where one of the questions for the panel with just a dialogue system actually have

to know what is talking about

in this context we just have string

representation and there could be lots of natural follow ons from that tell me more

utterance like

you know what's related to that is possible at least mosquito or you know

right there reading cycle or some you know that could be anaphora and things up

like that and they can be really interesting if there was a really principled way

that we could combine search result somehow into the discourse context

so that we were just doing query rewrite in we could actually kind of have

a more integrated method and what do if you have any thoughts about

very good point so i mean how would you so with all a model of

how could one handled is multiturn dialogue so more than morning to turn say three

turns for turns

well

the unique approach would be to just

basically concatenate all the three views a dialogue context into the first response and if

this to the model and then

always use as these second response fed to the model what you

would like to say next

of course this

only works for so long and forty four for so many dialog turns

i agree that in order to you well that would be a good idea to

have a

maybe a deeper modeling of compare conversations and of conversational structure in order to

you move this redundancy problem in the more

general way

right so

a related aspect or one thing that we are stumbled across is that

i mean of course redundancy is not always a bad thing to do you need

a certain amount of redundancy that you need to keep if you would just

remove redundancy from a follow-up responses

you might face the problem that the user actually wonders if you are still talking

about the same thing

and

that's also something to consider so nh would be the right thing to come up

with more general model of a redundancy and dialogue and

that would give you on

toolkit on how to handle or kind of that kinds of redundancy in good kinds

of redundancy that you shouldn't remove or that you may be should actively seek to

include in to your a response

thank you very interesting works thank you

what i was wondering about goes into the directional you know how you maintain coherence

dialog and specifically you said something about

how to select the friend

you know non-overlapping phrases

and then you select the menu stick them back to get my question is

maybe i misunderstood but by questions how did you actually then do the generation process

how to do you then you know make sure it's a well-formed stream with

coherence meaning you know an for us and so on

right

so we so what we did was really

employing a very shallow approach so we did not

generate you responses paltry

so i mean i said earlier that i'm there are many different ways to implement

is this question answering black box and one

one thing to do is maybe to employ a passage retrieval that is

so you take the user question and you compare it against a huge collection of

text and you

you try to find the passage in the text

that

possibly answers this question and in the shallow approach you wouldn't really try to understand

what is india what is in this passage you would just try to search and

if it

answers the question or not and didn't we would rely on the

basically on the on the

the mentality of the source text

that we use so we did not do any kind of

are two generation user didn't change let's i dunno rows and with just a basic

different set

exactly yes handling enough for an in a more sophisticated way that's also

probably something that a bunch of look into a in the future

Redundancy Localization for the Conversationalization of Unstructured Responses

Special Session: Natural Language Generation for Dialogue Systems

Sebastian Krause, Mikhail Kozhevnikov, Eric Malmi and Daniele Pighin