0:00:35 | okay |
---|
0:00:37 | i'm marilyn walker and the word that i'm presenting his phd work about my denominator |
---|
0:00:44 | media who can be here |
---|
0:00:48 | i'm gonna talk about summarizing biologic arguments and social media and the first thing i |
---|
0:00:53 | guess they wanna say about this you know negotiation session is it's not clear how |
---|
0:00:57 | much negotiation it's gary actually carried on |
---|
0:01:00 | in these argumentative dialogues |
---|
0:01:04 | although they definitely thing to be to go see at something |
---|
0:01:08 | so |
---|
0:01:09 | the current data that are in summaries of argumentative dialogues really human state-of-the-art |
---|
0:01:17 | so websites have curators to manually curate argument summaries and so lots of different debate |
---|
0:01:25 | websites |
---|
0:01:26 | well have curated argument so i debate has you know these kinds of points for |
---|
0:01:30 | points against and |
---|
0:01:32 | and pro conduct work has the top ten pro and con arguments so on these |
---|
0:01:37 | websites they kind of summarise like what are the repeated arguments that people make about |
---|
0:01:42 | a particular |
---|
0:01:43 | social issue these examples are this one example here about gay marriage and another one |
---|
0:01:49 | about gun control |
---|
0:01:52 | and |
---|
0:01:53 | when you go when you look at the natural human dialogues where people discussing the |
---|
0:01:58 | same kinds of issues it's really striking how difficult it would be to actually produce |
---|
0:02:03 | a summary |
---|
0:02:04 | of these dialogues |
---|
0:02:06 | i'll give you minutes a kind you know i know you're gonna read it anyway |
---|
0:02:10 | i |
---|
0:02:11 | i give you |
---|
0:02:13 | and then it to me to a |
---|
0:02:15 | you know people are very emotional there not necessarily logical they make fun of each |
---|
0:02:21 | other they're sarcastic |
---|
0:02:22 | there's all kinds of stuff going on in these dialogues |
---|
0:02:27 | they do kind of fit in with your notion of what you would |
---|
0:02:31 | happens a summary of an argument especially when you compare these to the curated arguments |
---|
0:02:35 | that are produced by |
---|
0:02:37 | by professionals and so the question the first question that we had was obviously it |
---|
0:02:41 | would be great if you could actually summarize the whole bunch of conversations out there |
---|
0:02:45 | and social media like what is it that the person on this tree |
---|
0:02:49 | is thing about gay marriage and what is it that the person on the straight |
---|
0:02:52 | is saying about gun control or portion or even lucien or |
---|
0:02:56 | any kind of issue that's be constantly debated only social media website |
---|
0:03:01 | and i would claim that you're interested not just in like what are the kind |
---|
0:03:05 | of arguments that lawyer a constitutional expert would actually make but you're actually interested to |
---|
0:03:10 | know |
---|
0:03:11 | what is that people are saying you know you |
---|
0:03:14 | it's everybody can vote these days right you have a whether or not you're in |
---|
0:03:19 | the top one percent of the population that's actually educated |
---|
0:03:22 | how to argue logically |
---|
0:03:24 | so it's be good thing to actually no you know what it is that people |
---|
0:03:28 | are saying what kinds of arguments that they're making |
---|
0:03:30 | when you look at the easiest thing you know |
---|
0:03:34 | what should the summary contain what kind of information should we pull out of these |
---|
0:03:38 | conversations in order to make a summary |
---|
0:03:40 | and you know the common convergence don't agree and so you know to you seems |
---|
0:03:46 | like you would at least need to represent both sides of the arguments of that's |
---|
0:03:50 | would be like that may be a first criteria |
---|
0:03:53 | but you know you want to represent the opposing stances |
---|
0:03:57 | then do you want to include some kind of emotional information in it do you |
---|
0:04:01 | want to include socio-emotional relationship like that the second speaker |
---|
0:04:05 | making fun of the first speaker that they're being sarcastic do you should that kind |
---|
0:04:10 | of information going to |
---|
0:04:12 | a summary or |
---|
0:04:13 | you know do you want to take like the philosophical argumentation logical argumentation you and |
---|
0:04:18 | say well i'm gonna i'm gonna consider all this to be just the flame or |
---|
0:04:22 | troll or |
---|
0:04:23 | whatever i'm not really interested in any part of this argument that doesn't actually fit |
---|
0:04:28 | in with the logical you of argumentation |
---|
0:04:32 | and there has been previous work on dialogue summarisation but there hasn't been anywhere on |
---|
0:04:39 | summarizing argumentative dialogues automatically |
---|
0:04:42 | and so all the other high dialogue summarisation that are that's out there some of |
---|
0:04:47 | which i think spin but done by some of people in this room |
---|
0:04:51 | they all have very different properties |
---|
0:04:53 | and they're not merely as |
---|
0:04:55 | i don't really as these argumentative dialogues are |
---|
0:05:01 | so our goal is to automatically produced summaries of argumentative dialogues then |
---|
0:05:06 | we're taking and extractive summarization perspective at this point although would clearly be nice if |
---|
0:05:12 | we could do abstract of summarisation |
---|
0:05:14 | and so one that step that we're trying to do in this paper is we're |
---|
0:05:18 | trying to identify and extract what are the most important |
---|
0:05:22 | arguments |
---|
0:05:23 | on each side |
---|
0:05:25 | for an issue |
---|
0:05:27 | and are |
---|
0:05:29 | our initial starting point is that as i pointed out of previous slides actually really |
---|
0:05:33 | difficult to figure out what information these summary should contain and so we start from |
---|
0:05:38 | the standpoint that summarization is something that any native speaker knows how to do they |
---|
0:05:43 | don't have to have any training |
---|
0:05:44 | and so are initial concept this is that we're gonna |
---|
0:05:49 | collect summaries of that humans produce a piece conversations and see what people pick out |
---|
0:05:55 | and then we're gonna take these summaries that we collected and we're gonna apply the |
---|
0:05:59 | here amid method which is been used like in duck summarization task |
---|
0:06:03 | and we're gonna assume that the arguments that up here in model summaries that those |
---|
0:06:08 | of the most important argument so we're |
---|
0:06:10 | a kind of applying a standard summary extractive summarization and evaluation approach |
---|
0:06:15 | to these argumentative dialogues |
---|
0:06:18 | so we have gold standard training data |
---|
0:06:21 | we have |
---|
0:06:22 | collected five human summaries |
---|
0:06:24 | for me for each of about fifty dialogues on the topics the gay marriage gun |
---|
0:06:28 | control and abortion |
---|
0:06:30 | and that the |
---|
0:06:34 | a lot of this is described in more detail in our paper in a whole |
---|
0:06:38 | ami this paper in a twenty fifteen |
---|
0:06:41 | the but the summaries look like and what their properties are then we trained undergraduate |
---|
0:06:46 | linguists to use the pyramid method to identify important arguments in the dialogue |
---|
0:06:51 | so they construct spearman's for each set of five summaries |
---|
0:06:55 | and the idea that the repeated elements of the summaries and upon the higher here's |
---|
0:06:59 | of the peer minute are gonna give you an example in a minute some cases |
---|
0:07:02 | this is all probably |
---|
0:07:03 | go to you are so that will be clear |
---|
0:07:05 | after next slide |
---|
0:07:07 | then so then we have we have this human dialogues we have five summaries for |
---|
0:07:12 | each dialogue |
---|
0:07:14 | and then we have these purim it's that are constructed on top of each of |
---|
0:07:18 | those summaries look you know what |
---|
0:07:19 | elements get repeated |
---|
0:07:21 | then we still have a problem where we know which of the important concepts in |
---|
0:07:25 | the dialogue because those of the once it appeared in the model summaries |
---|
0:07:28 | we have to map it was actually original dialogues if we want to develop an |
---|
0:07:32 | extractive summarizer we want to be able to operate on the original dialogue texts and |
---|
0:07:37 | not that intermediate summary representation which we collected right |
---|
0:07:41 | so that's the third step of |
---|
0:07:43 | getting this mapping back and then once we have that making |
---|
0:07:47 | characterize our problem is a binary problem or ranking problem of identifying the most important |
---|
0:07:54 | utterances in the dialogues that we want to go into the extractive summary |
---|
0:08:00 | this is what kind of samples summaries look like this is from a gay marriage |
---|
0:08:05 | dialogue |
---|
0:08:06 | you know so there |
---|
0:08:07 | the these summarizers they're really good quality and the ones for gay marriage are currently |
---|
0:08:12 | available |
---|
0:08:13 | on our website at not a thought as so we just for gay marriage the |
---|
0:08:17 | new ones that we collected better talked about in this paper about abortion |
---|
0:08:22 | and then ensure we will be releasing |
---|
0:08:24 | soon |
---|
0:08:25 | but if you want to see what they look like just for gay marriage these |
---|
0:08:28 | were released a few years ago with our |
---|
0:08:31 | previous paper |
---|
0:08:32 | so |
---|
0:08:33 | this is what the data looks like so we have the summaries for these different |
---|
0:08:38 | fifty different about fifty different conversations for each topic |
---|
0:08:42 | and let them |
---|
0:08:45 | human does when they make the pyramid label |
---|
0:08:48 | is the kind of read through all the summaries they decide what are the important |
---|
0:08:52 | concepts kind of distinct from the words that are actually used by the summarizers |
---|
0:08:58 | and they make their own human label so they come up with the human label |
---|
0:09:01 | which is the paraphrase |
---|
0:09:03 | no one has been able to prove that gun owners are safer than on gun |
---|
0:09:06 | owners |
---|
0:09:07 | and then they identify for each summary how this summarizer phrase that particular argument that |
---|
0:09:13 | particular concept |
---|
0:09:14 | and i think i'm a concept in more than one of the |
---|
0:09:18 | summaries up to five because we have five summaries |
---|
0:09:21 | then that means that that's very important concept so that |
---|
0:09:25 | represented in this tier right |
---|
0:09:26 | so the arguments |
---|
0:09:28 | that multiple summarizers picked out |
---|
0:09:31 | and put in their summaries and |
---|
0:09:34 | having more contributors right in this human label and they end up being ranked is |
---|
0:09:39 | more important argument |
---|
0:09:44 | okay so |
---|
0:09:45 | so that we're on step three where we now have these |
---|
0:09:49 | we have these are summary contributors which again as i said they're removed from the |
---|
0:09:54 | language of the original dialogues |
---|
0:09:56 | and we have these human labels |
---|
0:09:58 | and what we want to do is to figure out in the original dialogue |
---|
0:10:02 | what i utterance is actually correspond to these things that ended up really highly ranked |
---|
0:10:06 | in the in the peer in it |
---|
0:10:08 | and |
---|
0:10:09 | where only collected this data like two or three years ago we well we're gonna |
---|
0:10:13 | be able to do this automatically once we had this space |
---|
0:10:17 | and after multiple different attempts we decided that we could impact of it automatically |
---|
0:10:22 | because the language of the summarizers and the language of the |
---|
0:10:26 | of the human labels from the pure images |
---|
0:10:28 | two different |
---|
0:10:29 | from the original language in the original dialogues so we did speaker |
---|
0:10:34 | and their mechanical turk tasks |
---|
0:10:37 | something try actually we didn't do it we didn't right on mechanical turk we couldn't |
---|
0:10:41 | get mechanical turkers to do this task reliably of map back from the summary labels |
---|
0:10:46 | to the original dialogues |
---|
0:10:48 | so i forgot this we added that we recruited to graduate linguists into undergraduate linguists |
---|
0:10:53 | to actually do this mapping forced in order to get good quality data |
---|
0:10:58 | so we have we presented them with the original conversations and thus have the labels |
---|
0:11:07 | other produce the highest tier labels |
---|
0:11:09 | and we ask for each utterance of the conversation |
---|
0:11:12 | to pick one or more of the labels that correspond to the content of that |
---|
0:11:17 | conversation and again where only interested |
---|
0:11:20 | where only interested in the |
---|
0:11:22 | utterances that have a score of three or higher that are considered most important by |
---|
0:11:27 | the original summarizers |
---|
0:11:29 | and we get pretty good reliability on this |
---|
0:11:32 | once we started using our own internal train people we could get |
---|
0:11:37 | turkers |
---|
0:11:38 | to do this reliably |
---|
0:11:40 | so |
---|
0:11:41 | so that i |
---|
0:11:43 | three |
---|
0:11:44 | we have the fifty dialogues for each you know size that we had about a |
---|
0:11:47 | fifty for each one |
---|
0:11:49 | so effective dialogue twenty fifty summaries |
---|
0:11:51 | five for each dialogue how we pull out the important sentences and the not important |
---|
0:11:56 | sentences for each dialogue and we frame is that the as a binary classification task |
---|
0:12:01 | again we could rate have framed as the ranking task |
---|
0:12:05 | and you just use the peer label but |
---|
0:12:08 | we decided to just frame it is binary classification |
---|
0:12:12 | so we group the labels liked here we compute |
---|
0:12:14 | compute the average tear label and then we define any sense with an average are |
---|
0:12:19 | scored very high risk been an important |
---|
0:12:21 | so we believe that we provided a well motivated and theoretically grounded definition of what |
---|
0:12:27 | is an important argument by going through this whole process |
---|
0:12:30 | and now we have this binary classification problem we're trying to do |
---|
0:12:34 | so we have a three different off-the-shelf summarizers |
---|
0:12:39 | that we apply this to see how standard summary algorithms work so we use some |
---|
0:12:43 | basic |
---|
0:12:44 | which is a algorithm by think open about inventing we use this kl divergence summarization |
---|
0:12:51 | which is from heidi in front of any these are all available off-the-shelf and we |
---|
0:12:56 | used lex rank this of these are all different kind of the algorithm selects rank |
---|
0:13:00 | is the one that was |
---|
0:13:02 | most successful at the most recent document understanding |
---|
0:13:07 | competition |
---|
0:13:08 | and all of these rank utterances instead of classify them |
---|
0:13:11 | so what we did with we apply them to the dialogues |
---|
0:13:15 | we get the ranking and then the |
---|
0:13:19 | we take e |
---|
0:13:20 | number of utterances that are in the task so we kind of say let's do |
---|
0:13:25 | that come up with the ranking at the point where |
---|
0:13:27 | the length of the extractive summaries the same |
---|
0:13:30 | as what we expect |
---|
0:13:32 | we have a bunch different models |
---|
0:13:35 | we tried support vector machines with the linear kernel for packet learned we use cross |
---|
0:13:40 | validation |
---|
0:13:41 | for tuning the parameters and then we also tried a combination |
---|
0:13:45 | bidirectional lstm with the convolutional neural net with the biased |
---|
0:13:50 | and we split our data into training and test |
---|
0:13:53 | for features we have hundreds |
---|
0:13:55 | two different kinds of word embeddings |
---|
0:13:57 | google work to back and low and then we have some other things that we |
---|
0:14:02 | think that are lean more linguistically motivated that we expected to |
---|
0:14:06 | problem possibly help |
---|
0:14:08 | so we have readability scores would expect that utterances that are more readable would be |
---|
0:14:12 | better and that be more important |
---|
0:14:15 | we thought sentiment might be important we thought the position sentence position in the summary |
---|
0:14:20 | might be important like the first sentences was summary might be more important of the |
---|
0:14:24 | first sentences in the dialogue |
---|
0:14:26 | and then we have linguistic intra in query word count which gives us a lot |
---|
0:14:30 | of lexical categories with three different representations of the context once one in terms of |
---|
0:14:35 | blue one in terms of the dialogue act classification of the previous utterances the previous |
---|
0:14:41 | two utterances in the dialogue |
---|
0:14:43 | and then we ran stanford co graph |
---|
0:14:45 | which |
---|
0:14:46 | i expected to not produce anything so that's a little foreshadowing it works it actually |
---|
0:14:52 | helps amazingly |
---|
0:14:55 | and these are our results |
---|
0:14:57 | so let's rank was our very best baseline some not tell you what the other |
---|
0:15:01 | baselines were |
---|
0:15:03 | and so for lex rank we getting a weighted f-score on the test that |
---|
0:15:08 | in my upper fifties |
---|
0:15:10 | when we |
---|
0:15:12 | have just |
---|
0:15:14 | as the are very best model svm using features so as just with word embeddings |
---|
0:15:20 | is not just well but if we put all these linguistic features and we see |
---|
0:15:24 | that |
---|
0:15:24 | for both gun control and for the abortion topics that all the shock reference engine |
---|
0:15:31 | applied to these very things very noisy dialogues actually improves performance of having representation of |
---|
0:15:36 | the context |
---|
0:15:37 | and we get all you know we get better results for gun control |
---|
0:15:41 | that we do for gay marriage and abortion and we had that result repeatedly over |
---|
0:15:46 | and over and over it and we think that the reason this is the same |
---|
0:15:50 | arguments get repeated and gun control |
---|
0:15:53 | and it's not it's not have created |
---|
0:15:55 | i am about other topics |
---|
0:15:59 | so this cn and with the by lstm with just the word embeddings gets their |
---|
0:16:03 | in that kind of in those |
---|
0:16:05 | in the sixties |
---|
0:16:06 | and then we get our best model using |
---|
0:16:11 | i one along with features and what the gun control what this one shows here |
---|
0:16:15 | to let me remind you with these features are |
---|
0:16:17 | so l c p is lou |
---|
0:16:20 | with the context representation that is also a liu are is the readability ga c |
---|
0:16:25 | is a dialogue act score and then the colour out |
---|
0:16:29 | so for gun control having three different representations of context |
---|
0:16:35 | give gives us the best model |
---|
0:16:37 | and both for gay marriage and abortion as well just having this loop |
---|
0:16:42 | the categories of the a previous utterance also gives as good performance |
---|
0:16:47 | and so i think it's interesting have a pretty simple representation context it's not a |
---|
0:16:53 | sequential model that we do have something that shows that the context helps |
---|
0:16:59 | one minute |
---|
0:17:00 | okay |
---|
0:17:01 | the like let's right where very well |
---|
0:17:04 | this work very well because of all this reputation in dialogue |
---|
0:17:08 | so the assumption of blacks rank for like newspaper corpora is this something gets repeated |
---|
0:17:13 | it's important |
---|
0:17:14 | but it might infer from like the previous speaker talking about alignment there's lots of |
---|
0:17:19 | repetition in conversation that doesn't indicate that the information is actually important and it's based |
---|
0:17:25 | on lexical repetitions so it doesn't really help the it's interesting about |
---|
0:17:30 | sentiment is that something be positive sentiment actually turns out to be a very good |
---|
0:17:34 | predictor that is not important |
---|
0:17:36 | and it's not for the reason necessarily that you think it would be it's because |
---|
0:17:40 | sentiment classifiers think that anything that's conversational were data at any time |
---|
0:17:46 | is positive sentiment |
---|
0:17:48 | so it just rules out anything right today |
---|
0:17:51 | you know that is no where did you know no it is right it just |
---|
0:17:55 | rules out a lot of stuff it's just purely conversational and that |
---|
0:17:59 | and that's why |
---|
0:18:00 | sentiment house |
---|
0:18:02 | and then four categories we get some you know some loop categories that are different |
---|
0:18:07 | for each topic search shows that is |
---|
0:18:09 | some of the stuff that we're learning for the loop |
---|
0:18:11 | is actually topic specific |
---|
0:18:13 | "'cause" it's learning to use particular |
---|
0:18:16 | look categories okay |
---|
0:18:18 | so absent and a novel method for summarizing argumentative dialogues should our results speak several |
---|
0:18:23 | summarization baselines |
---|
0:18:25 | we compare the svm with the nor deep learning model |
---|
0:18:30 | show that the linguistic features actually really how |
---|
0:18:34 | the context based features |
---|
0:18:36 | improve |
---|
0:18:36 | over the sentence alone |
---|
0:18:38 | and then we wanna do more work exploring |
---|
0:18:41 | whether this could be topic-independent so i wouldn't want to point out that our baseline |
---|
0:18:46 | summary baselines are all topic-independent that don't need any |
---|
0:18:49 | training |
---|
0:18:50 | okay |
---|
0:18:52 | questions |
---|
0:19:01 | e |
---|
0:19:18 | that's really good point i needed that recently didn't we distinguish between |
---|
0:19:27 | conversations with there was more or less agreement and we have a |
---|
0:19:30 | we haven't looked at that so i think it should i should be interesting because |
---|
0:19:33 | you would think that it would be easier to summarize the conversation |
---|
0:19:37 | where they were segment seven where people where more on the same stance side |
---|
0:19:45 | yes |
---|
0:19:47 | i dunno uni you had you can you |
---|
0:19:59 | it's in the paper |
---|
0:20:04 | it seems like it seems like they would be pretty |
---|
0:20:09 | can you rephrase just me for a given the model when you features you still |
---|
0:20:16 | yes or them simultaneously for our method that pretty no i don't we tried that |
---|
0:20:25 | i don't think we did we tried word to back |
---|
0:20:27 | embeddings and then weighted glove embeddings we didn't put in both in |
---|
0:20:31 | and that we looked at both of those with |
---|
0:20:34 | i mean in like which features make a difference |
---|
0:20:37 | so there's a there's a hole in fact probably not all the results are in |
---|
0:20:40 | the paper |
---|
0:20:41 | but there is a pretty decent set of laplacian results in the paper about how |
---|
0:20:46 | much each feature country |
---|
0:20:50 | david you give a quick question |
---|
0:20:54 | sorry |
---|
0:20:57 | wait |
---|
0:21:08 | so trained on |
---|
0:21:10 | abortion to the store on goals |
---|
0:21:14 | also we have done here we had a |
---|
0:21:17 | paper |
---|
0:21:20 | a few years back where we did some cross domain experiments versus subset of this |
---|
0:21:24 | problem |
---|
0:21:25 | which is just |
---|
0:21:26 | trying to identify |
---|
0:21:29 | some |
---|
0:21:30 | sentences which are more likely to be understandable it's good arguments out of context in |
---|
0:21:35 | that |
---|
0:21:36 | that paper which has first author swanson i can tell you about it afterwards we |
---|
0:21:41 | did some cross domain experiments |
---|
0:21:43 | and of course doesn't work as well |
---|
0:21:46 | and it is interesting "'cause" you would think we have thought that we |
---|
0:21:50 | that most of the features that we're using would not be domain |
---|
0:21:53 | specific |
---|
0:21:54 | but every time we do that cross domain thing the results are about ten percent |
---|
0:21:59 | worse |
---|
0:22:00 | okay so you're the most domain-specific fisheries in buildings |
---|
0:22:05 | the embeddings and also that look and that you know you give it all the |
---|
0:22:09 | look features but the ones that the model |
---|
0:22:12 | learns the pay attention to our topic specific |
---|
0:22:22 | let's think the speaker again |
---|