Speech Transcript - Redundancy Localization for the Conversationalization of Unstructured Responses

0:00:15	so hello again i names sebastian i'm a phd student from then
0:00:20	and i am going to present now the paper with the title redundancy localisation for
0:00:26	the conversational iteration of unstructured responses
0:00:29	this is joint work with three other people might eric and ninety eight i don't
0:00:34	danielle a burger whose eric is a phd student in finland
0:00:38	and just proposed on mobile arrogant hire interning in zurich
0:00:43	so on
0:00:45	obviously d a general topic of this were is on dialogs and more specifically we
0:00:51	were working on a setting where a user engages in a conversation of the conversational
0:00:57	assistant such as the goal is then or series
0:01:00	that runs on a voice activated speaker
0:01:05	so we do not have any kind of display to convey a information we need
0:01:09	you a read out everything
0:01:12	basically all the information and we want to convey
0:01:15	this is important because it means that at least initially in a dialogue we should
0:01:20	into a you we aim to you give
0:01:24	concise answers and only after the user has confirmed interest in the kind of information
0:01:30	that we
0:01:32	did we get
0:01:33	then we can also say well basically a longer utterances
0:01:38	okay
0:01:39	now in the past there was a work on very different types of dialogues for
0:01:43	example a task oriented dialogue such as restaurant booking or chat boards where users engaging
0:01:49	chitchat
0:01:50	we have nowadays on the role
0:01:53	dialogue models
0:01:54	what does not what we need to re rather focus on something that could be
0:01:58	called informational dialogues
0:02:00	so this in this type of dialogue is users have and
0:02:06	information need that we try to satisfy
0:02:08	so for example a user might ask something like
0:02:12	what is an average
0:02:13	and
0:02:15	we
0:02:15	so in the setting where in the pipeline that we used
0:02:19	we follow the very shallow approach that is we just to the question
0:02:24	we for what it to a background question answering component and
0:02:29	information retrieval component
0:02:31	the cables a bunch of response candidates
0:02:36	that we could choose one
0:02:38	and as i mentioned initially we of selected the short answer such as one displayed
0:02:44	here
0:02:44	so that well that malaria is a disease cost by up to a small in
0:02:49	paris i'd transmitted by divide of in fact of mosquitoes
0:02:53	right there are many different options to implement such a question answering component but again
0:02:59	for over this we retreated as a black box
0:03:03	okay and what
0:03:05	so you focus of our work was on the problem that occurs when this
0:03:09	dialogue continues so
0:03:12	let's assume that the user all likes to kind of information that we give data
0:03:16	that is we are correctly understood what the user get the kind of information that
0:03:21	uses looking for
0:03:22	and the user says something like a tell me more or maybe the user issues
0:03:27	a follow-up responses
0:03:28	and then we would again go to the a question answering component and this time
0:03:34	you would select a longer response and what read that out in a whole that
0:03:39	this longer response contain some additional information that is of interest
0:03:44	to the user
0:03:45	another problem is the following this longer response o where many instances these longer responses
0:03:52	or partially
0:03:54	redundant with respect to what we have just said a like thirty seconds earlier so
0:04:00	in this particular example here d part and the lines
0:04:04	and i like that in red color is are redundant so it again mentions that
0:04:11	that miller a and areas cost by a parasite and that it is transmitted by
0:04:16	a particular kind of mosquito
0:04:18	so well again to sounds are redundant this is not a response that the user
0:04:24	would like to hear so we need to do something about this problem
0:04:27	and there are a two aspects to this research problem
0:04:32	the first aspect is that we need to understand it when and where a response
0:04:37	is redundant with respect to the dialogue context
0:04:41	that is we needed to localise redundant content in pairs of short text
0:04:46	so individualization here on the slide we have two rows of boxes the top role
0:04:51	so each box
0:04:54	it's suppose correspond to berkeley top row of boxes are corresponds to the first
0:04:58	short response the bottom row what's is the supposed to of all visualize the longer
0:05:05	follow-up responses
0:05:06	and our task is to basically selected boxes and bottom row that are redundant that
0:05:14	occur again
0:05:15	right and once we have this information then the next step is to adapt a
0:05:18	follow-up response to the previous dialogue context that is to discard repetitive content
0:05:25	okay there are many well there are few related phenomena in literature such as
0:05:32	task of recognizing textual entailment or the task of
0:05:35	semantic textual similarity both of which are deal with determining the
0:05:40	coarse grained relation between you short text
0:05:44	yes also something called interpretable semantic textual similarity which goes a little bit deeper into
0:05:51	d similarity of text by requiring a task participants to also provide a fine grained
0:05:57	alignment of the channels in that you text
0:06:01	so that means that was
0:06:02	lots of an inspiration in terms of existing models that we could
0:06:08	built upon an overall problem really was how
0:06:11	could be get our hands on data with fine grained redundancy annotation that would allow
0:06:16	us to train model and model in that and that problem
0:06:22	well one approach to get a does of course to and manually annotated but that
0:06:27	is going to be expensive so
0:06:28	what we did this we came up with the weight all
0:06:31	defining a we training signal and the idea for this we training signal is the
0:06:36	following
0:06:39	so
0:06:42	for a given question as i mentioned earlier this i are black box this question
0:06:47	answering black box
0:06:49	used as a bunch of response can be that's and associated with each response there
0:06:54	is a confidence score details you from the perspective of system
0:06:58	of the question answering system how
0:07:00	well this response candidate answers to use a question now with regard to the user
0:07:06	question it is quite likely that too high scoring to hiring answer candidates
0:07:13	are paraphrases while if you compare the top
0:07:17	ranking and search one from falls down below
0:07:21	this response candidate list these two will probably only share some information so terrible i
0:07:27	would be some information missing and the lower ranked answer or whatever be
0:07:31	some additional information
0:07:32	now i in order to build this retraining signal what we did a sweeping three
0:07:36	these three answers so that you've
0:07:39	from top of the result list and one from for the down
0:07:42	you'll is repair to do you top ranking ones
0:07:46	and prepared to top ranking one with one from further down the result list
0:07:51	with that each peer two d model and drawl model and head model
0:07:57	produce a coarse grained similarity score for each of the pairs
0:08:00	and then we define the drinking objective that is shown on the slide that is
0:08:05	we
0:08:06	push the model towards the signing higher
0:08:09	redundancy score higher for screen similarities for two d to a
0:08:14	top ranking answer candidates
0:08:16	and you hope was that if we gave the model appropriate capacity and appropriate structure
0:08:22	then it would
0:08:23	in order to produce this coarse grained similar to score it would learn how to
0:08:27	line and compare the constituents of
0:08:29	responses
0:08:32	no this slide you can see an example triple
0:08:38	so this is this is the kind of
0:08:41	data to report but you're on the slide all the three responses that you can
0:08:46	see are only a one sentence long but in reality work for a passage level
0:08:51	data
0:08:52	so they were like to three or four sentences per response
0:08:56	and the multi coloured boxes in this example are supposed to indicate the basic semantic
0:09:05	building blocks of these responses and as you can see d
0:09:08	first through it first answers have which are did you
0:09:12	highest ranking answer is added are returned for particular query
0:09:17	from this question answering component
0:09:20	share for
0:09:22	for semantic building blocks for the first and the third answer on the share
0:09:25	like half of the of the semantic content
0:09:28	right so we build a dataset of one point five million such response triples and
0:09:34	we use that you at training
0:09:37	a model that we that
0:09:40	that was of all the development of an already existing and already published model for
0:09:46	recognizing textual entailment
0:09:48	and you model is essentially a three component feed forward neural network
0:09:54	and which means that was really fast to train which was a good thing since
0:09:59	we had so much data to process
0:10:02	okay now let's take a creek
0:10:05	high level
0:10:07	you on the on the model
0:10:10	so the input two or more number two response two responses
0:10:14	under that inside your of the slide and on the right hand side of the
0:10:17	slide you can see the we can see the output of the model which
0:10:20	was a coarse grained a high-level similarity score for these two responses
0:10:25	now in the first component de model
0:10:29	should use produced an alignment of the two responses that is
0:10:34	it on
0:10:35	should use a custom representation of the first response
0:10:39	for each token of these second response
0:10:43	then in the in the second component
0:10:46	the these are custom representations were compared
0:10:50	so you
0:10:51	so this custom representation the first
0:10:54	response were compared to each token of the second response which give us a local
0:10:58	redundancy score so token level redundancy scores for the second answer
0:11:02	and then in the first component
0:11:05	these
0:11:06	local redundancy scores were aggregated in order to produce this
0:11:11	coarse grained this high-level redundancy school
0:11:14	okay
0:11:15	so this is how the training word and
0:11:19	now its application time and inference time we weren't really interested in the is a
0:11:25	coarse grained similarity score so what we did after model training we
0:11:29	basically chop of that part of the of the of the model
0:11:33	and we additionally that these system as input a given
0:11:39	segmentation of the
0:11:41	second response into phrase it's
0:11:44	then we aggregated the redundancy scores the local redundancy scores
0:11:48	for each segment for each phrase and that a bus
0:11:53	phrase level redundancy scores for d phrases in you second response
0:12:00	okay
0:12:01	so we carried out
0:12:03	to
0:12:05	a twofold evaluation so in the first aspect of the evaluation we are concentrated on
0:12:11	looking into d capability of our model to actually localise redundancy
0:12:17	so what we did as we propose a held-out passages from our training data
0:12:22	so here you can see an example pairs so we have to first response passageway
0:12:27	which is a relatively short and then we have a longer
0:12:30	for what has such
0:12:31	we did not change the first response what we automatically segmented
0:12:38	these second response
0:12:40	and then we showed up of the first response and the second response to raters
0:12:43	and asked raters to assign
0:12:46	any redundancy label
0:12:48	to each of the segments of the second
0:12:51	response
0:12:53	now in this dataset there are a one thousand two hundred a passage pairs with
0:12:58	fine grained redundancy annotation we use this data set you
0:13:02	to what you at all model the dataset is released on get top
0:13:07	and right so and we ran our model on this dataset and we compared to
0:13:12	its capability of localising redundancy
0:13:15	against the
0:13:17	original model for recognizing entailment
0:13:20	and discourse that you concede you're on the slide hours you're men correlation values of
0:13:25	the well so the correlation of the predicted redundancy with the a rate are assigned
0:13:29	redundancy
0:13:30	as you can see all model was
0:13:33	outperforming dtd baseline model
0:13:36	what you can see on the right hand side is that's
0:13:39	is the is the scatter plot of our models
0:13:43	segment level phrase level redundancy scores
0:13:48	plotted against the goal of the redundancy score of directors
0:13:53	well you can see you can see two things and this has gotta but first
0:13:57	there is a clear correlation between the two kinds of scores
0:14:00	and second you can also see that the absolute
0:14:05	redundancy scores that our model produces for each segment are a bit hard to interpret
0:14:11	so these are not really useful what it is
0:14:15	indeed useful is the ranking of
0:14:18	so do redundancy ranking of segments that is induced by d's
0:14:22	score so the ranking of
0:14:24	segments inside a passage so you cannot use discourse to compare the redundancy of
0:14:30	segments across data
0:14:32	data examples which can use it to rank you of redundancy
0:14:36	a drank the passages
0:14:38	in the past
0:14:39	during the segments in the passage according to their
0:14:42	redundancy and just ranking is what we what we used in the
0:14:48	second experiment
0:14:50	so in the second experiment
0:14:53	we looked into you and d
0:14:56	in fact that this model can have on the quality of the
0:15:00	of the kind of dialogues that i showed only also and to on the on
0:15:04	these informational dialogues
0:15:06	so what we did this we
0:15:09	show two raters first the initial response dinitial short response we also show than the
0:15:15	original follow-up responses
0:15:18	and we for the more also show them in a compressed
0:15:23	follow-up responses that was compressed using the redundancy scores of from all model
0:15:28	so what we did here is we
0:15:31	we follow the relatively simple strategy for a compressing the passage that is we worked
0:15:36	here in this experiment
0:15:38	which is really just the preliminary experiment kind of a kind of a pilot study
0:15:41	so we work with sentence-level segments and we just discarded the
0:15:47	sentence that was most redundant according to all mortal and then we ask raters which
0:15:53	variant of the to follow up which variant of the followup response data like more
0:15:58	the original one what you compress one
0:16:02	so as i mentioned this is only a preliminary experiments all there was a really
0:16:06	small sample size
0:16:08	and we
0:16:10	we compared our model here against two simple baselines one most always dropped first sentence
0:16:15	of t for a response
0:16:17	and the other one was to drop the sentence which had the highest lexical level
0:16:21	over that
0:16:22	with t first
0:16:25	so would a result of this experiment was that users
0:16:30	like i'll decompression space and all a model more in terms of the naturalness of
0:16:36	of the produced compressed passage so that is a good thing
0:16:41	but again that was only a
0:16:42	quite small sample size
0:16:45	on the downside the there was a slight informativeness loss
0:16:51	of the
0:16:52	compressed
0:16:53	a follow-up passage and all model performance
0:16:55	versus on this informativeness
0:16:58	metric compared to the two baselines
0:17:02	this might not be a very big deal since i mean
0:17:06	to certain degree you do expect some loss of informativeness in compressed passages
0:17:12	and naturalness is really the key thing to look for here
0:17:17	okay that's it already so the slide summarises the contributions of all work
0:17:25	so we
0:17:27	describe a particular problem in the space of textual similarity that you face when you're
0:17:32	a bit informational dialogues
0:17:34	we created a evaluation data for this problem
0:17:38	we release the state we
0:17:40	propose both a model to localise redundancy
0:17:45	and also a way to train this model in weakly supervised play
0:17:51	and maybe d
0:17:53	take a message from this from the store them from all work is that
0:17:57	also having
0:17:58	right well also following a relatively shallow approach do you dialogues combined with a relatively
0:18:04	simple
0:18:05	and the role model can already well
0:18:08	give you white
0:18:09	so the performance and can already you know
0:18:13	improve over the over the original
0:18:16	thanks that's it
0:18:36	maybe i thought that was one question myself this is with respect to i am
0:18:40	loss of informativeness i mean in the example that you showed for instance
0:18:45	the and exact type of the most usual for instance was one of the things
0:18:49	that work it out a rodent all you know because it was in the sentence
0:18:53	that was
0:18:54	generally redundant right except for the same the
0:18:58	right and with this
0:19:01	approach that you've chosen where you basically about phrases which are mostly redundant or which
0:19:07	sport i and redundancy this information just gets lost right where's you know i was
0:19:12	wondering whether you could comment more
0:19:16	right "'cause" i imagine that quite common right
0:19:19	right so that that's exactly the point of this informativeness loss
0:19:24	so one could argue that so as i said naturalness as much more important than
0:19:29	informativeness
0:19:31	that is
0:19:33	having a natural sounding a answer that might not give you give a user that
0:19:38	information about the particular kind of mosquito
0:19:42	is a small the thing that we that we looked for and maybe one thing
0:19:48	to look into in the future is
0:19:51	how this
0:19:54	localised redundancy can be exploited in the compression stage in a more you know in
0:20:00	the in a more sophisticated way so that this
0:20:03	informativeness losses remedy for example
0:20:07	so in that
0:20:08	and that particular example that you mention that the particular kind of mosquito which is
0:20:13	no longer mentioned in which we just discard
0:20:17	say a sophisticated sentence compression model and might be able to you
0:20:24	process as input token level redundancy scores and could then maybe produce a produce a
0:20:30	sentence pair this information about the particular kind of was keyed also the name of
0:20:35	the of the of the family of the mosquito the type of the mosquitoes still
0:20:39	mentioned but we all the one most of the redundant content is left out
0:20:44	so that might something to look into
0:20:52	thanks for very nice talk
0:20:55	curious about them to generalize
0:20:57	sation of the method to pass the second turn of the dialog gonna tell me
0:21:02	more context and so this may be general question that might
0:21:06	foreshadow some of the discussion that we could have in the in the panel session
0:21:10	where one of the questions for the panel with just a dialogue system actually have
0:21:14	to know what is talking about
0:21:15	in this context we just have string
0:21:18	representation and there could be lots of natural follow ons from that tell me more
0:21:22	utterance like
0:21:24	you know what's related to that is possible at least mosquito or you know
0:21:31	right there reading cycle or some you know that could be anaphora and things up
0:21:35	like that and they can be really interesting if there was a really principled way
0:21:40	that we could combine search result somehow into the discourse context
0:21:46	so that we were just doing query rewrite in we could actually kind of have
0:21:50	a more integrated method and what do if you have any thoughts about
0:21:54	very good point so i mean how would you so with all a model of
0:21:58	how could one handled is multiturn dialogue so more than morning to turn say three
0:22:05	turns for turns
0:22:06	well
0:22:07	the unique approach would be to just
0:22:10	basically concatenate all the three views a dialogue context into the first response and if
0:22:15	this to the model and then
0:22:17	always use as these second response fed to the model what you
0:22:21	would like to say next
0:22:24	of course this
0:22:26	only works for so long and forty four for so many dialog turns
0:22:32	i agree that in order to you well that would be a good idea to
0:22:37	have a
0:22:38	maybe a deeper modeling of compare conversations and of conversational structure in order to
0:22:46	you move this redundancy problem in the more
0:22:49	general way
0:22:52	right so
0:22:54	a related aspect or one thing that we are stumbled across is that
0:23:00	i mean of course redundancy is not always a bad thing to do you need
0:23:02	a certain amount of redundancy that you need to keep if you would just
0:23:06	remove redundancy from a follow-up responses
0:23:10	you might face the problem that the user actually wonders if you are still talking
0:23:15	about the same thing
0:23:17	and
0:23:19	that's also something to consider so nh would be the right thing to come up
0:23:25	with more general model of a redundancy and dialogue and
0:23:28	that would give you on
0:23:31	toolkit on how to handle or kind of that kinds of redundancy in good kinds
0:23:36	of redundancy that you shouldn't remove or that you may be should actively seek to
0:23:41	include in to your a response
0:23:46	thank you very interesting works thank you
0:23:51	what i was wondering about goes into the directional you know how you maintain coherence
0:23:56	dialog and specifically you said something about
0:24:01	how to select the friend
0:24:03	you know non-overlapping phrases
0:24:07	and then you select the menu stick them back to get my question is
0:24:12	maybe i misunderstood but by questions how did you actually then do the generation process
0:24:18	how to do you then you know make sure it's a well-formed stream with
0:24:22	coherence meaning you know an for us and so on
0:24:29	right
0:24:32	so we so what we did was really
0:24:36	employing a very shallow approach so we did not
0:24:40	generate you responses paltry
0:24:45	so i mean i said earlier that i'm there are many different ways to implement
0:24:49	is this question answering black box and one
0:24:53	one thing to do is maybe to employ a passage retrieval that is
0:24:57	so you take the user question and you compare it against a huge collection of
0:25:02	text and you
0:25:04	you try to find the passage in the text
0:25:08	that
0:25:09	possibly answers this question and in the shallow approach you wouldn't really try to understand
0:25:14	what is india what is in this passage you would just try to search and
0:25:18	if it
0:25:19	answers the question or not and didn't we would rely on the
0:25:23	basically on the on the
0:25:25	the mentality of the source text
0:25:27	that we use so we did not do any kind of
0:25:31	are two generation user didn't change let's i dunno rows and with just a basic
0:25:37	different set
0:25:39	exactly yes handling enough for an in a more sophisticated way that's also
0:25:45	probably something that a bunch of look into a in the future

Redundancy Localization for the Conversationalization of Unstructured Responses

Special Session: Natural Language Generation for Dialogue Systems

Sebastian Krause, Mikhail Kozhevnikov, Eric Malmi and Daniele Pighin