0:00:15so hello again i names sebastian i'm a phd student from then
0:00:20and i am going to present now the paper with the title redundancy localisation for
0:00:26the conversational iteration of unstructured responses
0:00:29this is joint work with three other people might eric and ninety eight i don't
0:00:34danielle a burger whose eric is a phd student in finland
0:00:38and just proposed on mobile arrogant hire interning in zurich
0:00:43so on
0:00:45obviously d a general topic of this were is on dialogs and more specifically we
0:00:51were working on a setting where a user engages in a conversation of the conversational
0:00:57assistant such as the goal is then or series
0:01:00that runs on a voice activated speaker
0:01:05so we do not have any kind of display to convey a information we need
0:01:09you a read out everything
0:01:12basically all the information and we want to convey
0:01:15this is important because it means that at least initially in a dialogue we should
0:01:20into a you we aim to you give
0:01:24concise answers and only after the user has confirmed interest in the kind of information
0:01:30that we
0:01:32did we get
0:01:33then we can also say well basically a longer utterances
0:01:38okay
0:01:39now in the past there was a work on very different types of dialogues for
0:01:43example a task oriented dialogue such as restaurant booking or chat boards where users engaging
0:01:49chitchat
0:01:50we have nowadays on the role
0:01:53dialogue models
0:01:54what does not what we need to re rather focus on something that could be
0:01:58called informational dialogues
0:02:00so this in this type of dialogue is users have and
0:02:06information need that we try to satisfy
0:02:08so for example a user might ask something like
0:02:12what is an average
0:02:13and
0:02:15we
0:02:15so in the setting where in the pipeline that we used
0:02:19we follow the very shallow approach that is we just to the question
0:02:24we for what it to a background question answering component and
0:02:29information retrieval component
0:02:31the cables a bunch of response candidates
0:02:36that we could choose one
0:02:38and as i mentioned initially we of selected the short answer such as one displayed
0:02:44here
0:02:44so that well that malaria is a disease cost by up to a small in
0:02:49paris i'd transmitted by divide of in fact of mosquitoes
0:02:53right there are many different options to implement such a question answering component but again
0:02:59for over this we retreated as a black box
0:03:03okay and what
0:03:05so you focus of our work was on the problem that occurs when this
0:03:09dialogue continues so
0:03:12let's assume that the user all likes to kind of information that we give data
0:03:16that is we are correctly understood what the user get the kind of information that
0:03:21uses looking for
0:03:22and the user says something like a tell me more or maybe the user issues
0:03:27a follow-up responses
0:03:28and then we would again go to the a question answering component and this time
0:03:34you would select a longer response and what read that out in a whole that
0:03:39this longer response contain some additional information that is of interest
0:03:44to the user
0:03:45another problem is the following this longer response o where many instances these longer responses
0:03:52or partially
0:03:54redundant with respect to what we have just said a like thirty seconds earlier so
0:04:00in this particular example here d part and the lines
0:04:04and i like that in red color is are redundant so it again mentions that
0:04:11that miller a and areas cost by a parasite and that it is transmitted by
0:04:16a particular kind of mosquito
0:04:18so well again to sounds are redundant this is not a response that the user
0:04:24would like to hear so we need to do something about this problem
0:04:27and there are a two aspects to this research problem
0:04:32the first aspect is that we need to understand it when and where a response
0:04:37is redundant with respect to the dialogue context
0:04:41that is we needed to localise redundant content in pairs of short text
0:04:46so individualization here on the slide we have two rows of boxes the top role
0:04:51so each box
0:04:54it's suppose correspond to berkeley top row of boxes are corresponds to the first
0:04:58short response the bottom row what's is the supposed to of all visualize the longer
0:05:05follow-up responses
0:05:06and our task is to basically selected boxes and bottom row that are redundant that
0:05:14occur again
0:05:15right and once we have this information then the next step is to adapt a
0:05:18follow-up response to the previous dialogue context that is to discard repetitive content
0:05:25okay there are many well there are few related phenomena in literature such as
0:05:32task of recognizing textual entailment or the task of
0:05:35semantic textual similarity both of which are deal with determining the
0:05:40coarse grained relation between you short text
0:05:44yes also something called interpretable semantic textual similarity which goes a little bit deeper into
0:05:51d similarity of text by requiring a task participants to also provide a fine grained
0:05:57alignment of the channels in that you text
0:06:01so that means that was
0:06:02lots of an inspiration in terms of existing models that we could
0:06:08built upon an overall problem really was how
0:06:11could be get our hands on data with fine grained redundancy annotation that would allow
0:06:16us to train model and model in that and that problem
0:06:22well one approach to get a does of course to and manually annotated but that
0:06:27is going to be expensive so
0:06:28what we did this we came up with the weight all
0:06:31defining a we training signal and the idea for this we training signal is the
0:06:36following
0:06:39so
0:06:42for a given question as i mentioned earlier this i are black box this question
0:06:47answering black box
0:06:49used as a bunch of response can be that's and associated with each response there
0:06:54is a confidence score details you from the perspective of system
0:06:58of the question answering system how
0:07:00well this response candidate answers to use a question now with regard to the user
0:07:06question it is quite likely that too high scoring to hiring answer candidates
0:07:13are paraphrases while if you compare the top
0:07:17ranking and search one from falls down below
0:07:21this response candidate list these two will probably only share some information so terrible i
0:07:27would be some information missing and the lower ranked answer or whatever be
0:07:31some additional information
0:07:32now i in order to build this retraining signal what we did a sweeping three
0:07:36these three answers so that you've
0:07:39from top of the result list and one from for the down
0:07:42you'll is repair to do you top ranking ones
0:07:46and prepared to top ranking one with one from further down the result list
0:07:51with that each peer two d model and drawl model and head model
0:07:57produce a coarse grained similarity score for each of the pairs
0:08:00and then we define the drinking objective that is shown on the slide that is
0:08:05we
0:08:06push the model towards the signing higher
0:08:09redundancy score higher for screen similarities for two d to a
0:08:14top ranking answer candidates
0:08:16and you hope was that if we gave the model appropriate capacity and appropriate structure
0:08:22then it would
0:08:23in order to produce this coarse grained similar to score it would learn how to
0:08:27line and compare the constituents of
0:08:29responses
0:08:32no this slide you can see an example triple
0:08:38so this is this is the kind of
0:08:41data to report but you're on the slide all the three responses that you can
0:08:46see are only a one sentence long but in reality work for a passage level
0:08:51data
0:08:52so they were like to three or four sentences per response
0:08:56and the multi coloured boxes in this example are supposed to indicate the basic semantic
0:09:05building blocks of these responses and as you can see d
0:09:08first through it first answers have which are did you
0:09:12highest ranking answer is added are returned for particular query
0:09:17from this question answering component
0:09:20share for
0:09:22for semantic building blocks for the first and the third answer on the share
0:09:25like half of the of the semantic content
0:09:28right so we build a dataset of one point five million such response triples and
0:09:34we use that you at training
0:09:37a model that we that
0:09:40that was of all the development of an already existing and already published model for
0:09:46recognizing textual entailment
0:09:48and you model is essentially a three component feed forward neural network
0:09:54and which means that was really fast to train which was a good thing since
0:09:59we had so much data to process
0:10:02okay now let's take a creek
0:10:05high level
0:10:07you on the on the model
0:10:10so the input two or more number two response two responses
0:10:14under that inside your of the slide and on the right hand side of the
0:10:17slide you can see the we can see the output of the model which
0:10:20was a coarse grained a high-level similarity score for these two responses
0:10:25now in the first component de model
0:10:29should use produced an alignment of the two responses that is
0:10:34it on
0:10:35should use a custom representation of the first response
0:10:39for each token of these second response
0:10:43then in the in the second component
0:10:46the these are custom representations were compared
0:10:50so you
0:10:51so this custom representation the first
0:10:54response were compared to each token of the second response which give us a local
0:10:58redundancy score so token level redundancy scores for the second answer
0:11:02and then in the first component
0:11:05these
0:11:06local redundancy scores were aggregated in order to produce this
0:11:11coarse grained this high-level redundancy school
0:11:14okay
0:11:15so this is how the training word and
0:11:19now its application time and inference time we weren't really interested in the is a
0:11:25coarse grained similarity score so what we did after model training we
0:11:29basically chop of that part of the of the of the model
0:11:33and we additionally that these system as input a given
0:11:39segmentation of the
0:11:41second response into phrase it's
0:11:44then we aggregated the redundancy scores the local redundancy scores
0:11:48for each segment for each phrase and that a bus
0:11:53phrase level redundancy scores for d phrases in you second response
0:12:00okay
0:12:01so we carried out
0:12:03to
0:12:05a twofold evaluation so in the first aspect of the evaluation we are concentrated on
0:12:11looking into d capability of our model to actually localise redundancy
0:12:17so what we did as we propose a held-out passages from our training data
0:12:22so here you can see an example pairs so we have to first response passageway
0:12:27which is a relatively short and then we have a longer
0:12:30for what has such
0:12:31we did not change the first response what we automatically segmented
0:12:38these second response
0:12:40and then we showed up of the first response and the second response to raters
0:12:43and asked raters to assign
0:12:46any redundancy label
0:12:48to each of the segments of the second
0:12:51response
0:12:53now in this dataset there are a one thousand two hundred a passage pairs with
0:12:58fine grained redundancy annotation we use this data set you
0:13:02to what you at all model the dataset is released on get top
0:13:07and right so and we ran our model on this dataset and we compared to
0:13:12its capability of localising redundancy
0:13:15against the
0:13:17original model for recognizing entailment
0:13:20and discourse that you concede you're on the slide hours you're men correlation values of
0:13:25the well so the correlation of the predicted redundancy with the a rate are assigned
0:13:29redundancy
0:13:30as you can see all model was
0:13:33outperforming dtd baseline model
0:13:36what you can see on the right hand side is that's
0:13:39is the is the scatter plot of our models
0:13:43segment level phrase level redundancy scores
0:13:48plotted against the goal of the redundancy score of directors
0:13:53well you can see you can see two things and this has gotta but first
0:13:57there is a clear correlation between the two kinds of scores
0:14:00and second you can also see that the absolute
0:14:05redundancy scores that our model produces for each segment are a bit hard to interpret
0:14:11so these are not really useful what it is
0:14:15indeed useful is the ranking of
0:14:18so do redundancy ranking of segments that is induced by d's
0:14:22score so the ranking of
0:14:24segments inside a passage so you cannot use discourse to compare the redundancy of
0:14:30segments across data
0:14:32data examples which can use it to rank you of redundancy
0:14:36a drank the passages
0:14:38in the past
0:14:39during the segments in the passage according to their
0:14:42redundancy and just ranking is what we what we used in the
0:14:48second experiment
0:14:50so in the second experiment
0:14:53we looked into you and d
0:14:56in fact that this model can have on the quality of the
0:15:00of the kind of dialogues that i showed only also and to on the on
0:15:04these informational dialogues
0:15:06so what we did this we
0:15:09show two raters first the initial response dinitial short response we also show than the
0:15:15original follow-up responses
0:15:18and we for the more also show them in a compressed
0:15:23follow-up responses that was compressed using the redundancy scores of from all model
0:15:28so what we did here is we
0:15:31we follow the relatively simple strategy for a compressing the passage that is we worked
0:15:36here in this experiment
0:15:38which is really just the preliminary experiment kind of a kind of a pilot study
0:15:41so we work with sentence-level segments and we just discarded the
0:15:47sentence that was most redundant according to all mortal and then we ask raters which
0:15:53variant of the to follow up which variant of the followup response data like more
0:15:58the original one what you compress one
0:16:02so as i mentioned this is only a preliminary experiments all there was a really
0:16:06small sample size
0:16:08and we
0:16:10we compared our model here against two simple baselines one most always dropped first sentence
0:16:15of t for a response
0:16:17and the other one was to drop the sentence which had the highest lexical level
0:16:21over that
0:16:22with t first
0:16:25so would a result of this experiment was that users
0:16:30like i'll decompression space and all a model more in terms of the naturalness of
0:16:36of the produced compressed passage so that is a good thing
0:16:41but again that was only a
0:16:42quite small sample size
0:16:45on the downside the there was a slight informativeness loss
0:16:51of the
0:16:52compressed
0:16:53a follow-up passage and all model performance
0:16:55versus on this informativeness
0:16:58metric compared to the two baselines
0:17:02this might not be a very big deal since i mean
0:17:06to certain degree you do expect some loss of informativeness in compressed passages
0:17:12and naturalness is really the key thing to look for here
0:17:17okay that's it already so the slide summarises the contributions of all work
0:17:25so we
0:17:27describe a particular problem in the space of textual similarity that you face when you're
0:17:32a bit informational dialogues
0:17:34we created a evaluation data for this problem
0:17:38we release the state we
0:17:40propose both a model to localise redundancy
0:17:45and also a way to train this model in weakly supervised play
0:17:51and maybe d
0:17:53take a message from this from the store them from all work is that
0:17:57also having
0:17:58right well also following a relatively shallow approach do you dialogues combined with a relatively
0:18:04simple
0:18:05and the role model can already well
0:18:08give you white
0:18:09so the performance and can already you know
0:18:13improve over the over the original
0:18:16thanks that's it
0:18:36maybe i thought that was one question myself this is with respect to i am
0:18:40loss of informativeness i mean in the example that you showed for instance
0:18:45the and exact type of the most usual for instance was one of the things
0:18:49that work it out a rodent all you know because it was in the sentence
0:18:53that was
0:18:54generally redundant right except for the same the
0:18:58right and with this
0:19:01approach that you've chosen where you basically about phrases which are mostly redundant or which
0:19:07sport i and redundancy this information just gets lost right where's you know i was
0:19:12wondering whether you could comment more
0:19:16right "'cause" i imagine that quite common right
0:19:19right so that that's exactly the point of this informativeness loss
0:19:24so one could argue that so as i said naturalness as much more important than
0:19:29informativeness
0:19:31that is
0:19:33having a natural sounding a answer that might not give you give a user that
0:19:38information about the particular kind of mosquito
0:19:42is a small the thing that we that we looked for and maybe one thing
0:19:48to look into in the future is
0:19:51how this
0:19:54localised redundancy can be exploited in the compression stage in a more you know in
0:20:00the in a more sophisticated way so that this
0:20:03informativeness losses remedy for example
0:20:07so in that
0:20:08and that particular example that you mention that the particular kind of mosquito which is
0:20:13no longer mentioned in which we just discard
0:20:17say a sophisticated sentence compression model and might be able to you
0:20:24process as input token level redundancy scores and could then maybe produce a produce a
0:20:30sentence pair this information about the particular kind of was keyed also the name of
0:20:35the of the of the family of the mosquito the type of the mosquitoes still
0:20:39mentioned but we all the one most of the redundant content is left out
0:20:44so that might something to look into
0:20:52thanks for very nice talk
0:20:55curious about them to generalize
0:20:57sation of the method to pass the second turn of the dialog gonna tell me
0:21:02more context and so this may be general question that might
0:21:06foreshadow some of the discussion that we could have in the in the panel session
0:21:10where one of the questions for the panel with just a dialogue system actually have
0:21:14to know what is talking about
0:21:15in this context we just have string
0:21:18representation and there could be lots of natural follow ons from that tell me more
0:21:22utterance like
0:21:24you know what's related to that is possible at least mosquito or you know
0:21:31right there reading cycle or some you know that could be anaphora and things up
0:21:35like that and they can be really interesting if there was a really principled way
0:21:40that we could combine search result somehow into the discourse context
0:21:46so that we were just doing query rewrite in we could actually kind of have
0:21:50a more integrated method and what do if you have any thoughts about
0:21:54very good point so i mean how would you so with all a model of
0:21:58how could one handled is multiturn dialogue so more than morning to turn say three
0:22:05turns for turns
0:22:06well
0:22:07the unique approach would be to just
0:22:10basically concatenate all the three views a dialogue context into the first response and if
0:22:15this to the model and then
0:22:17always use as these second response fed to the model what you
0:22:21would like to say next
0:22:24of course this
0:22:26only works for so long and forty four for so many dialog turns
0:22:32i agree that in order to you well that would be a good idea to
0:22:37have a
0:22:38maybe a deeper modeling of compare conversations and of conversational structure in order to
0:22:46you move this redundancy problem in the more
0:22:49general way
0:22:52right so
0:22:54a related aspect or one thing that we are stumbled across is that
0:23:00i mean of course redundancy is not always a bad thing to do you need
0:23:02a certain amount of redundancy that you need to keep if you would just
0:23:06remove redundancy from a follow-up responses
0:23:10you might face the problem that the user actually wonders if you are still talking
0:23:15about the same thing
0:23:17and
0:23:19that's also something to consider so nh would be the right thing to come up
0:23:25with more general model of a redundancy and dialogue and
0:23:28that would give you on
0:23:31toolkit on how to handle or kind of that kinds of redundancy in good kinds
0:23:36of redundancy that you shouldn't remove or that you may be should actively seek to
0:23:41include in to your a response
0:23:46thank you very interesting works thank you
0:23:51what i was wondering about goes into the directional you know how you maintain coherence
0:23:56dialog and specifically you said something about
0:24:01how to select the friend
0:24:03you know non-overlapping phrases
0:24:07and then you select the menu stick them back to get my question is
0:24:12maybe i misunderstood but by questions how did you actually then do the generation process
0:24:18how to do you then you know make sure it's a well-formed stream with
0:24:22coherence meaning you know an for us and so on
0:24:29right
0:24:32so we so what we did was really
0:24:36employing a very shallow approach so we did not
0:24:40generate you responses paltry
0:24:45so i mean i said earlier that i'm there are many different ways to implement
0:24:49is this question answering black box and one
0:24:53one thing to do is maybe to employ a passage retrieval that is
0:24:57so you take the user question and you compare it against a huge collection of
0:25:02text and you
0:25:04you try to find the passage in the text
0:25:08that
0:25:09possibly answers this question and in the shallow approach you wouldn't really try to understand
0:25:14what is india what is in this passage you would just try to search and
0:25:18if it
0:25:19answers the question or not and didn't we would rely on the
0:25:23basically on the on the
0:25:25the mentality of the source text
0:25:27that we use so we did not do any kind of
0:25:31are two generation user didn't change let's i dunno rows and with just a basic
0:25:37different set
0:25:39exactly yes handling enough for an in a more sophisticated way that's also
0:25:45probably something that a bunch of look into a in the future