0:00:15 | so hello again i names sebastian i'm a phd student from then |
---|
0:00:20 | and i am going to present now the paper with the title redundancy localisation for |
---|
0:00:26 | the conversational iteration of unstructured responses |
---|
0:00:29 | this is joint work with three other people might eric and ninety eight i don't |
---|
0:00:34 | danielle a burger whose eric is a phd student in finland |
---|
0:00:38 | and just proposed on mobile arrogant hire interning in zurich |
---|
0:00:43 | so on |
---|
0:00:45 | obviously d a general topic of this were is on dialogs and more specifically we |
---|
0:00:51 | were working on a setting where a user engages in a conversation of the conversational |
---|
0:00:57 | assistant such as the goal is then or series |
---|
0:01:00 | that runs on a voice activated speaker |
---|
0:01:05 | so we do not have any kind of display to convey a information we need |
---|
0:01:09 | you a read out everything |
---|
0:01:12 | basically all the information and we want to convey |
---|
0:01:15 | this is important because it means that at least initially in a dialogue we should |
---|
0:01:20 | into a you we aim to you give |
---|
0:01:24 | concise answers and only after the user has confirmed interest in the kind of information |
---|
0:01:30 | that we |
---|
0:01:32 | did we get |
---|
0:01:33 | then we can also say well basically a longer utterances |
---|
0:01:38 | okay |
---|
0:01:39 | now in the past there was a work on very different types of dialogues for |
---|
0:01:43 | example a task oriented dialogue such as restaurant booking or chat boards where users engaging |
---|
0:01:49 | chitchat |
---|
0:01:50 | we have nowadays on the role |
---|
0:01:53 | dialogue models |
---|
0:01:54 | what does not what we need to re rather focus on something that could be |
---|
0:01:58 | called informational dialogues |
---|
0:02:00 | so this in this type of dialogue is users have and |
---|
0:02:06 | information need that we try to satisfy |
---|
0:02:08 | so for example a user might ask something like |
---|
0:02:12 | what is an average |
---|
0:02:13 | and |
---|
0:02:15 | we |
---|
0:02:15 | so in the setting where in the pipeline that we used |
---|
0:02:19 | we follow the very shallow approach that is we just to the question |
---|
0:02:24 | we for what it to a background question answering component and |
---|
0:02:29 | information retrieval component |
---|
0:02:31 | the cables a bunch of response candidates |
---|
0:02:36 | that we could choose one |
---|
0:02:38 | and as i mentioned initially we of selected the short answer such as one displayed |
---|
0:02:44 | here |
---|
0:02:44 | so that well that malaria is a disease cost by up to a small in |
---|
0:02:49 | paris i'd transmitted by divide of in fact of mosquitoes |
---|
0:02:53 | right there are many different options to implement such a question answering component but again |
---|
0:02:59 | for over this we retreated as a black box |
---|
0:03:03 | okay and what |
---|
0:03:05 | so you focus of our work was on the problem that occurs when this |
---|
0:03:09 | dialogue continues so |
---|
0:03:12 | let's assume that the user all likes to kind of information that we give data |
---|
0:03:16 | that is we are correctly understood what the user get the kind of information that |
---|
0:03:21 | uses looking for |
---|
0:03:22 | and the user says something like a tell me more or maybe the user issues |
---|
0:03:27 | a follow-up responses |
---|
0:03:28 | and then we would again go to the a question answering component and this time |
---|
0:03:34 | you would select a longer response and what read that out in a whole that |
---|
0:03:39 | this longer response contain some additional information that is of interest |
---|
0:03:44 | to the user |
---|
0:03:45 | another problem is the following this longer response o where many instances these longer responses |
---|
0:03:52 | or partially |
---|
0:03:54 | redundant with respect to what we have just said a like thirty seconds earlier so |
---|
0:04:00 | in this particular example here d part and the lines |
---|
0:04:04 | and i like that in red color is are redundant so it again mentions that |
---|
0:04:11 | that miller a and areas cost by a parasite and that it is transmitted by |
---|
0:04:16 | a particular kind of mosquito |
---|
0:04:18 | so well again to sounds are redundant this is not a response that the user |
---|
0:04:24 | would like to hear so we need to do something about this problem |
---|
0:04:27 | and there are a two aspects to this research problem |
---|
0:04:32 | the first aspect is that we need to understand it when and where a response |
---|
0:04:37 | is redundant with respect to the dialogue context |
---|
0:04:41 | that is we needed to localise redundant content in pairs of short text |
---|
0:04:46 | so individualization here on the slide we have two rows of boxes the top role |
---|
0:04:51 | so each box |
---|
0:04:54 | it's suppose correspond to berkeley top row of boxes are corresponds to the first |
---|
0:04:58 | short response the bottom row what's is the supposed to of all visualize the longer |
---|
0:05:05 | follow-up responses |
---|
0:05:06 | and our task is to basically selected boxes and bottom row that are redundant that |
---|
0:05:14 | occur again |
---|
0:05:15 | right and once we have this information then the next step is to adapt a |
---|
0:05:18 | follow-up response to the previous dialogue context that is to discard repetitive content |
---|
0:05:25 | okay there are many well there are few related phenomena in literature such as |
---|
0:05:32 | task of recognizing textual entailment or the task of |
---|
0:05:35 | semantic textual similarity both of which are deal with determining the |
---|
0:05:40 | coarse grained relation between you short text |
---|
0:05:44 | yes also something called interpretable semantic textual similarity which goes a little bit deeper into |
---|
0:05:51 | d similarity of text by requiring a task participants to also provide a fine grained |
---|
0:05:57 | alignment of the channels in that you text |
---|
0:06:01 | so that means that was |
---|
0:06:02 | lots of an inspiration in terms of existing models that we could |
---|
0:06:08 | built upon an overall problem really was how |
---|
0:06:11 | could be get our hands on data with fine grained redundancy annotation that would allow |
---|
0:06:16 | us to train model and model in that and that problem |
---|
0:06:22 | well one approach to get a does of course to and manually annotated but that |
---|
0:06:27 | is going to be expensive so |
---|
0:06:28 | what we did this we came up with the weight all |
---|
0:06:31 | defining a we training signal and the idea for this we training signal is the |
---|
0:06:36 | following |
---|
0:06:39 | so |
---|
0:06:42 | for a given question as i mentioned earlier this i are black box this question |
---|
0:06:47 | answering black box |
---|
0:06:49 | used as a bunch of response can be that's and associated with each response there |
---|
0:06:54 | is a confidence score details you from the perspective of system |
---|
0:06:58 | of the question answering system how |
---|
0:07:00 | well this response candidate answers to use a question now with regard to the user |
---|
0:07:06 | question it is quite likely that too high scoring to hiring answer candidates |
---|
0:07:13 | are paraphrases while if you compare the top |
---|
0:07:17 | ranking and search one from falls down below |
---|
0:07:21 | this response candidate list these two will probably only share some information so terrible i |
---|
0:07:27 | would be some information missing and the lower ranked answer or whatever be |
---|
0:07:31 | some additional information |
---|
0:07:32 | now i in order to build this retraining signal what we did a sweeping three |
---|
0:07:36 | these three answers so that you've |
---|
0:07:39 | from top of the result list and one from for the down |
---|
0:07:42 | you'll is repair to do you top ranking ones |
---|
0:07:46 | and prepared to top ranking one with one from further down the result list |
---|
0:07:51 | with that each peer two d model and drawl model and head model |
---|
0:07:57 | produce a coarse grained similarity score for each of the pairs |
---|
0:08:00 | and then we define the drinking objective that is shown on the slide that is |
---|
0:08:05 | we |
---|
0:08:06 | push the model towards the signing higher |
---|
0:08:09 | redundancy score higher for screen similarities for two d to a |
---|
0:08:14 | top ranking answer candidates |
---|
0:08:16 | and you hope was that if we gave the model appropriate capacity and appropriate structure |
---|
0:08:22 | then it would |
---|
0:08:23 | in order to produce this coarse grained similar to score it would learn how to |
---|
0:08:27 | line and compare the constituents of |
---|
0:08:29 | responses |
---|
0:08:32 | no this slide you can see an example triple |
---|
0:08:38 | so this is this is the kind of |
---|
0:08:41 | data to report but you're on the slide all the three responses that you can |
---|
0:08:46 | see are only a one sentence long but in reality work for a passage level |
---|
0:08:51 | data |
---|
0:08:52 | so they were like to three or four sentences per response |
---|
0:08:56 | and the multi coloured boxes in this example are supposed to indicate the basic semantic |
---|
0:09:05 | building blocks of these responses and as you can see d |
---|
0:09:08 | first through it first answers have which are did you |
---|
0:09:12 | highest ranking answer is added are returned for particular query |
---|
0:09:17 | from this question answering component |
---|
0:09:20 | share for |
---|
0:09:22 | for semantic building blocks for the first and the third answer on the share |
---|
0:09:25 | like half of the of the semantic content |
---|
0:09:28 | right so we build a dataset of one point five million such response triples and |
---|
0:09:34 | we use that you at training |
---|
0:09:37 | a model that we that |
---|
0:09:40 | that was of all the development of an already existing and already published model for |
---|
0:09:46 | recognizing textual entailment |
---|
0:09:48 | and you model is essentially a three component feed forward neural network |
---|
0:09:54 | and which means that was really fast to train which was a good thing since |
---|
0:09:59 | we had so much data to process |
---|
0:10:02 | okay now let's take a creek |
---|
0:10:05 | high level |
---|
0:10:07 | you on the on the model |
---|
0:10:10 | so the input two or more number two response two responses |
---|
0:10:14 | under that inside your of the slide and on the right hand side of the |
---|
0:10:17 | slide you can see the we can see the output of the model which |
---|
0:10:20 | was a coarse grained a high-level similarity score for these two responses |
---|
0:10:25 | now in the first component de model |
---|
0:10:29 | should use produced an alignment of the two responses that is |
---|
0:10:34 | it on |
---|
0:10:35 | should use a custom representation of the first response |
---|
0:10:39 | for each token of these second response |
---|
0:10:43 | then in the in the second component |
---|
0:10:46 | the these are custom representations were compared |
---|
0:10:50 | so you |
---|
0:10:51 | so this custom representation the first |
---|
0:10:54 | response were compared to each token of the second response which give us a local |
---|
0:10:58 | redundancy score so token level redundancy scores for the second answer |
---|
0:11:02 | and then in the first component |
---|
0:11:05 | these |
---|
0:11:06 | local redundancy scores were aggregated in order to produce this |
---|
0:11:11 | coarse grained this high-level redundancy school |
---|
0:11:14 | okay |
---|
0:11:15 | so this is how the training word and |
---|
0:11:19 | now its application time and inference time we weren't really interested in the is a |
---|
0:11:25 | coarse grained similarity score so what we did after model training we |
---|
0:11:29 | basically chop of that part of the of the of the model |
---|
0:11:33 | and we additionally that these system as input a given |
---|
0:11:39 | segmentation of the |
---|
0:11:41 | second response into phrase it's |
---|
0:11:44 | then we aggregated the redundancy scores the local redundancy scores |
---|
0:11:48 | for each segment for each phrase and that a bus |
---|
0:11:53 | phrase level redundancy scores for d phrases in you second response |
---|
0:12:00 | okay |
---|
0:12:01 | so we carried out |
---|
0:12:03 | to |
---|
0:12:05 | a twofold evaluation so in the first aspect of the evaluation we are concentrated on |
---|
0:12:11 | looking into d capability of our model to actually localise redundancy |
---|
0:12:17 | so what we did as we propose a held-out passages from our training data |
---|
0:12:22 | so here you can see an example pairs so we have to first response passageway |
---|
0:12:27 | which is a relatively short and then we have a longer |
---|
0:12:30 | for what has such |
---|
0:12:31 | we did not change the first response what we automatically segmented |
---|
0:12:38 | these second response |
---|
0:12:40 | and then we showed up of the first response and the second response to raters |
---|
0:12:43 | and asked raters to assign |
---|
0:12:46 | any redundancy label |
---|
0:12:48 | to each of the segments of the second |
---|
0:12:51 | response |
---|
0:12:53 | now in this dataset there are a one thousand two hundred a passage pairs with |
---|
0:12:58 | fine grained redundancy annotation we use this data set you |
---|
0:13:02 | to what you at all model the dataset is released on get top |
---|
0:13:07 | and right so and we ran our model on this dataset and we compared to |
---|
0:13:12 | its capability of localising redundancy |
---|
0:13:15 | against the |
---|
0:13:17 | original model for recognizing entailment |
---|
0:13:20 | and discourse that you concede you're on the slide hours you're men correlation values of |
---|
0:13:25 | the well so the correlation of the predicted redundancy with the a rate are assigned |
---|
0:13:29 | redundancy |
---|
0:13:30 | as you can see all model was |
---|
0:13:33 | outperforming dtd baseline model |
---|
0:13:36 | what you can see on the right hand side is that's |
---|
0:13:39 | is the is the scatter plot of our models |
---|
0:13:43 | segment level phrase level redundancy scores |
---|
0:13:48 | plotted against the goal of the redundancy score of directors |
---|
0:13:53 | well you can see you can see two things and this has gotta but first |
---|
0:13:57 | there is a clear correlation between the two kinds of scores |
---|
0:14:00 | and second you can also see that the absolute |
---|
0:14:05 | redundancy scores that our model produces for each segment are a bit hard to interpret |
---|
0:14:11 | so these are not really useful what it is |
---|
0:14:15 | indeed useful is the ranking of |
---|
0:14:18 | so do redundancy ranking of segments that is induced by d's |
---|
0:14:22 | score so the ranking of |
---|
0:14:24 | segments inside a passage so you cannot use discourse to compare the redundancy of |
---|
0:14:30 | segments across data |
---|
0:14:32 | data examples which can use it to rank you of redundancy |
---|
0:14:36 | a drank the passages |
---|
0:14:38 | in the past |
---|
0:14:39 | during the segments in the passage according to their |
---|
0:14:42 | redundancy and just ranking is what we what we used in the |
---|
0:14:48 | second experiment |
---|
0:14:50 | so in the second experiment |
---|
0:14:53 | we looked into you and d |
---|
0:14:56 | in fact that this model can have on the quality of the |
---|
0:15:00 | of the kind of dialogues that i showed only also and to on the on |
---|
0:15:04 | these informational dialogues |
---|
0:15:06 | so what we did this we |
---|
0:15:09 | show two raters first the initial response dinitial short response we also show than the |
---|
0:15:15 | original follow-up responses |
---|
0:15:18 | and we for the more also show them in a compressed |
---|
0:15:23 | follow-up responses that was compressed using the redundancy scores of from all model |
---|
0:15:28 | so what we did here is we |
---|
0:15:31 | we follow the relatively simple strategy for a compressing the passage that is we worked |
---|
0:15:36 | here in this experiment |
---|
0:15:38 | which is really just the preliminary experiment kind of a kind of a pilot study |
---|
0:15:41 | so we work with sentence-level segments and we just discarded the |
---|
0:15:47 | sentence that was most redundant according to all mortal and then we ask raters which |
---|
0:15:53 | variant of the to follow up which variant of the followup response data like more |
---|
0:15:58 | the original one what you compress one |
---|
0:16:02 | so as i mentioned this is only a preliminary experiments all there was a really |
---|
0:16:06 | small sample size |
---|
0:16:08 | and we |
---|
0:16:10 | we compared our model here against two simple baselines one most always dropped first sentence |
---|
0:16:15 | of t for a response |
---|
0:16:17 | and the other one was to drop the sentence which had the highest lexical level |
---|
0:16:21 | over that |
---|
0:16:22 | with t first |
---|
0:16:25 | so would a result of this experiment was that users |
---|
0:16:30 | like i'll decompression space and all a model more in terms of the naturalness of |
---|
0:16:36 | of the produced compressed passage so that is a good thing |
---|
0:16:41 | but again that was only a |
---|
0:16:42 | quite small sample size |
---|
0:16:45 | on the downside the there was a slight informativeness loss |
---|
0:16:51 | of the |
---|
0:16:52 | compressed |
---|
0:16:53 | a follow-up passage and all model performance |
---|
0:16:55 | versus on this informativeness |
---|
0:16:58 | metric compared to the two baselines |
---|
0:17:02 | this might not be a very big deal since i mean |
---|
0:17:06 | to certain degree you do expect some loss of informativeness in compressed passages |
---|
0:17:12 | and naturalness is really the key thing to look for here |
---|
0:17:17 | okay that's it already so the slide summarises the contributions of all work |
---|
0:17:25 | so we |
---|
0:17:27 | describe a particular problem in the space of textual similarity that you face when you're |
---|
0:17:32 | a bit informational dialogues |
---|
0:17:34 | we created a evaluation data for this problem |
---|
0:17:38 | we release the state we |
---|
0:17:40 | propose both a model to localise redundancy |
---|
0:17:45 | and also a way to train this model in weakly supervised play |
---|
0:17:51 | and maybe d |
---|
0:17:53 | take a message from this from the store them from all work is that |
---|
0:17:57 | also having |
---|
0:17:58 | right well also following a relatively shallow approach do you dialogues combined with a relatively |
---|
0:18:04 | simple |
---|
0:18:05 | and the role model can already well |
---|
0:18:08 | give you white |
---|
0:18:09 | so the performance and can already you know |
---|
0:18:13 | improve over the over the original |
---|
0:18:16 | thanks that's it |
---|
0:18:36 | maybe i thought that was one question myself this is with respect to i am |
---|
0:18:40 | loss of informativeness i mean in the example that you showed for instance |
---|
0:18:45 | the and exact type of the most usual for instance was one of the things |
---|
0:18:49 | that work it out a rodent all you know because it was in the sentence |
---|
0:18:53 | that was |
---|
0:18:54 | generally redundant right except for the same the |
---|
0:18:58 | right and with this |
---|
0:19:01 | approach that you've chosen where you basically about phrases which are mostly redundant or which |
---|
0:19:07 | sport i and redundancy this information just gets lost right where's you know i was |
---|
0:19:12 | wondering whether you could comment more |
---|
0:19:16 | right "'cause" i imagine that quite common right |
---|
0:19:19 | right so that that's exactly the point of this informativeness loss |
---|
0:19:24 | so one could argue that so as i said naturalness as much more important than |
---|
0:19:29 | informativeness |
---|
0:19:31 | that is |
---|
0:19:33 | having a natural sounding a answer that might not give you give a user that |
---|
0:19:38 | information about the particular kind of mosquito |
---|
0:19:42 | is a small the thing that we that we looked for and maybe one thing |
---|
0:19:48 | to look into in the future is |
---|
0:19:51 | how this |
---|
0:19:54 | localised redundancy can be exploited in the compression stage in a more you know in |
---|
0:20:00 | the in a more sophisticated way so that this |
---|
0:20:03 | informativeness losses remedy for example |
---|
0:20:07 | so in that |
---|
0:20:08 | and that particular example that you mention that the particular kind of mosquito which is |
---|
0:20:13 | no longer mentioned in which we just discard |
---|
0:20:17 | say a sophisticated sentence compression model and might be able to you |
---|
0:20:24 | process as input token level redundancy scores and could then maybe produce a produce a |
---|
0:20:30 | sentence pair this information about the particular kind of was keyed also the name of |
---|
0:20:35 | the of the of the family of the mosquito the type of the mosquitoes still |
---|
0:20:39 | mentioned but we all the one most of the redundant content is left out |
---|
0:20:44 | so that might something to look into |
---|
0:20:52 | thanks for very nice talk |
---|
0:20:55 | curious about them to generalize |
---|
0:20:57 | sation of the method to pass the second turn of the dialog gonna tell me |
---|
0:21:02 | more context and so this may be general question that might |
---|
0:21:06 | foreshadow some of the discussion that we could have in the in the panel session |
---|
0:21:10 | where one of the questions for the panel with just a dialogue system actually have |
---|
0:21:14 | to know what is talking about |
---|
0:21:15 | in this context we just have string |
---|
0:21:18 | representation and there could be lots of natural follow ons from that tell me more |
---|
0:21:22 | utterance like |
---|
0:21:24 | you know what's related to that is possible at least mosquito or you know |
---|
0:21:31 | right there reading cycle or some you know that could be anaphora and things up |
---|
0:21:35 | like that and they can be really interesting if there was a really principled way |
---|
0:21:40 | that we could combine search result somehow into the discourse context |
---|
0:21:46 | so that we were just doing query rewrite in we could actually kind of have |
---|
0:21:50 | a more integrated method and what do if you have any thoughts about |
---|
0:21:54 | very good point so i mean how would you so with all a model of |
---|
0:21:58 | how could one handled is multiturn dialogue so more than morning to turn say three |
---|
0:22:05 | turns for turns |
---|
0:22:06 | well |
---|
0:22:07 | the unique approach would be to just |
---|
0:22:10 | basically concatenate all the three views a dialogue context into the first response and if |
---|
0:22:15 | this to the model and then |
---|
0:22:17 | always use as these second response fed to the model what you |
---|
0:22:21 | would like to say next |
---|
0:22:24 | of course this |
---|
0:22:26 | only works for so long and forty four for so many dialog turns |
---|
0:22:32 | i agree that in order to you well that would be a good idea to |
---|
0:22:37 | have a |
---|
0:22:38 | maybe a deeper modeling of compare conversations and of conversational structure in order to |
---|
0:22:46 | you move this redundancy problem in the more |
---|
0:22:49 | general way |
---|
0:22:52 | right so |
---|
0:22:54 | a related aspect or one thing that we are stumbled across is that |
---|
0:23:00 | i mean of course redundancy is not always a bad thing to do you need |
---|
0:23:02 | a certain amount of redundancy that you need to keep if you would just |
---|
0:23:06 | remove redundancy from a follow-up responses |
---|
0:23:10 | you might face the problem that the user actually wonders if you are still talking |
---|
0:23:15 | about the same thing |
---|
0:23:17 | and |
---|
0:23:19 | that's also something to consider so nh would be the right thing to come up |
---|
0:23:25 | with more general model of a redundancy and dialogue and |
---|
0:23:28 | that would give you on |
---|
0:23:31 | toolkit on how to handle or kind of that kinds of redundancy in good kinds |
---|
0:23:36 | of redundancy that you shouldn't remove or that you may be should actively seek to |
---|
0:23:41 | include in to your a response |
---|
0:23:46 | thank you very interesting works thank you |
---|
0:23:51 | what i was wondering about goes into the directional you know how you maintain coherence |
---|
0:23:56 | dialog and specifically you said something about |
---|
0:24:01 | how to select the friend |
---|
0:24:03 | you know non-overlapping phrases |
---|
0:24:07 | and then you select the menu stick them back to get my question is |
---|
0:24:12 | maybe i misunderstood but by questions how did you actually then do the generation process |
---|
0:24:18 | how to do you then you know make sure it's a well-formed stream with |
---|
0:24:22 | coherence meaning you know an for us and so on |
---|
0:24:29 | right |
---|
0:24:32 | so we so what we did was really |
---|
0:24:36 | employing a very shallow approach so we did not |
---|
0:24:40 | generate you responses paltry |
---|
0:24:45 | so i mean i said earlier that i'm there are many different ways to implement |
---|
0:24:49 | is this question answering black box and one |
---|
0:24:53 | one thing to do is maybe to employ a passage retrieval that is |
---|
0:24:57 | so you take the user question and you compare it against a huge collection of |
---|
0:25:02 | text and you |
---|
0:25:04 | you try to find the passage in the text |
---|
0:25:08 | that |
---|
0:25:09 | possibly answers this question and in the shallow approach you wouldn't really try to understand |
---|
0:25:14 | what is india what is in this passage you would just try to search and |
---|
0:25:18 | if it |
---|
0:25:19 | answers the question or not and didn't we would rely on the |
---|
0:25:23 | basically on the on the |
---|
0:25:25 | the mentality of the source text |
---|
0:25:27 | that we use so we did not do any kind of |
---|
0:25:31 | are two generation user didn't change let's i dunno rows and with just a basic |
---|
0:25:37 | different set |
---|
0:25:39 | exactly yes handling enough for an in a more sophisticated way that's also |
---|
0:25:45 | probably something that a bunch of look into a in the future |
---|