0:00:15 | and well you have to name cream right |
---|
0:00:20 | and this is another resources a paper like the last one that describes how we |
---|
0:00:23 | go a corpus of human authored a reference summaries for we'd a comma conversations online |
---|
0:00:30 | news so start off with a this work a fictional pitch |
---|
0:00:34 | so i think most is no or where a reader comments there you know wide |
---|
0:00:38 | range of online news sources some of which are shown down the right |
---|
0:00:42 | and these as a multi way conversations |
---|
0:00:44 | and they got lots of in information of potential buyers lot of rubbish as well |
---|
0:00:48 | as a lot of information of facial value to a range of users including |
---|
0:00:51 | one just people typically reading but people posting comments as well as reading to journalists |
---|
0:00:56 | and use that is maybe that and was and so on |
---|
0:00:59 | however i'm sure you've noticed this if you looked at these major problem is of |
---|
0:01:03 | the news article may quickly tract hundreds even thousands of comments |
---|
0:01:06 | if you readers a have the patience to wait three this much |
---|
0:01:10 | so i just estimation seems to be going to be last week automatic summarize these |
---|
0:01:14 | languages to gain some of overview of what's going on in this conversation |
---|
0:01:20 | cases but if i were don't known as already and divided the approach is up |
---|
0:01:24 | and sort of two broad categories what you michael |
---|
0:01:27 | technology different approaches that is |
---|
0:01:29 | let's try what we already know how to do and see how well it works |
---|
0:01:32 | so the idea is well i we without topic to cluster things so that's cluster |
---|
0:01:36 | all these comments topically using something like lda |
---|
0:01:39 | and then that's rank them using some sort of ranking algorithm goes ranked clusters rank |
---|
0:01:43 | senses and the clusters and levels build an extractive summary |
---|
0:01:46 | from the results |
---|
0:01:48 | and subsets of this we now to do when it generates a so-called summaries but |
---|
0:01:52 | in fact if you look at them |
---|
0:01:54 | i don't for a good summaries their |
---|
0:01:55 | they fail to capture the argument oriented nature of the |
---|
0:01:58 | a couple of the conversations pretty spectacularly |
---|
0:02:03 | the set of approaches which haven't really come to fruition that's are promising our might |
---|
0:02:08 | be called argument very different approaches those lot of work are given meeting social media |
---|
0:02:12 | this results in various schemes |
---|
0:02:14 | defining argument elements and relations is an argument of discourse and of such elements relations |
---|
0:02:20 | could in fact be detected in is these comments |
---|
0:02:23 | and they might form a basis for building a summary and the number of people |
---|
0:02:27 | working in this area have cited summarization is about a motivation for their work |
---|
0:02:32 | our noses yet proposed have given analysis of sort they |
---|
0:02:36 | a person with there is one code actually |
---|
0:02:39 | i drive a summary from the full reader comments that |
---|
0:02:44 | a well what we talk when we started this project that so the sensei project |
---|
0:02:48 | on which this work is based |
---|
0:02:51 | that's really what was need it is a an answer to the underlying fundamental question |
---|
0:02:57 | watches a summary of reader comments you like |
---|
0:02:59 | and also be helpful if we had seen him a generated exemplars |
---|
0:03:03 | for a real sets reader comments |
---|
0:03:06 | and this would allow us to both a better select appropriate technologies reader comments summarization |
---|
0:03:10 | and also |
---|
0:03:12 | to evaluate and develop are systems using these materials |
---|
0:03:17 | okay so that's gone interaction hands be structure for the talk so |
---|
0:03:20 | and the next talk a look at those watches a summary of reader comments like |
---|
0:03:24 | all then talk about a method that we developed for building or authoring reader comments |
---|
0:03:28 | summaries |
---|
0:03:29 | and talk about the corpus we don't |
---|
0:03:31 | some comments on related work in search time and conclusions and future work |
---|
0:03:37 | well what should the summary of reader comments be like well i think one can |
---|
0:03:41 | start from i think some remarks made by karen's barge ins that's a really what |
---|
0:03:45 | a summary should be like depends on the nature of the common sense that one |
---|
0:03:48 | wants to summarize of the used to which these summaries to be part |
---|
0:03:52 | so if we look at reader comments to say what characteristics almost must you think |
---|
0:03:56 | one that is common sets are typically organized into threads based on reply to structure |
---|
0:04:01 | every comment falls into exactly |
---|
0:04:03 | one thread another initiates a nice red or |
---|
0:04:06 | replies to exactly one comment earlier in the thread |
---|
0:04:09 | as a consequence that these conversations have the formal character a set of trees |
---|
0:04:14 | after an initial combat is really three separate rate |
---|
0:04:17 | now we have a comments or other intermediate or leaf nodes whose parrot is the |
---|
0:04:21 | comments which the reply |
---|
0:04:23 | now you might not have naively think that these threads are gonna be talking to |
---|
0:04:27 | a cohesive |
---|
0:04:28 | in practice they rarely are the same topic may be addressed across multiple threads |
---|
0:04:33 | actual conversation get long as people don't bother reading what's going before so they start |
---|
0:04:37 | off the same thing again |
---|
0:04:38 | and a single thread major from one topic onto another so there there's as many |
---|
0:04:42 | relation between |
---|
0:04:45 | threads and topics |
---|
0:04:46 | so he's quite example this is a big are indian a must for our data |
---|
0:04:50 | sources the guard in this paper in the high |
---|
0:04:53 | it's only hotly debated issue of |
---|
0:04:56 | when the very town councils council northern england decided to reduce |
---|
0:05:00 | robustness garbage collection see once every three weeks ones are rather than once every two |
---|
0:05:05 | weeks |
---|
0:05:05 | as you can imagine that sparked after all right |
---|
0:05:08 | and there are a course you compositions the original |
---|
0:05:11 | articles appear in the guardian a quick summary of the article the top followed by |
---|
0:05:14 | the detail |
---|
0:05:15 | and then the common starts on these are |
---|
0:05:18 | how well it sort of like this so starts off or something |
---|
0:05:21 | i can see how would attract |
---|
0:05:23 | right so another environment |
---|
0:05:25 | i know some difficult decisions had it may with cost funding but this seems like |
---|
0:05:28 | very poorly funded idea |
---|
0:05:30 | and then someone replies |
---|
0:05:32 | only people use compost bins and have no trouble with route score or foxes |
---|
0:05:36 | and so i don't roles and like this |
---|
0:05:40 | so our observation having looked a lot |
---|
0:05:43 | very many of these |
---|
0:05:44 | as a reader comments are primarily then or exclusively a comeback this argumentative the nature |
---|
0:05:50 | i was readers making |
---|
0:05:52 | assertions that either express a viewpoint or stances some college or on an issue |
---|
0:05:58 | raising the original article or by an earlier comment |
---|
0:06:00 | or providing evidence or grounds for believing if you want or assertion it's already been |
---|
0:06:05 | expressed |
---|
0:06:06 | so in the approach and with are developed a theoretical framework which reported in a |
---|
0:06:11 | a paper wrist argument mining workshop and in berlin a so it works well |
---|
0:06:16 | issues at the frame was based on the notion of issue where issues a question |
---|
0:06:19 | on which all of you have you points are possible so for instance shouldn't collect |
---|
0:06:24 | should be produced once every three weeks |
---|
0:06:26 | which is a binary alternatives |
---|
0:06:28 | i didn't be binary that they can be an open-ended as well like |
---|
0:06:33 | been initially something what was the best from two thousand and fifteen |
---|
0:06:37 | else is worth noting the issues are often implies that is the not directly expressed |
---|
0:06:41 | in the comments |
---|
0:06:42 | and so for instance this issue which unfolds on the common set it is referred |
---|
0:06:46 | to |
---|
0:06:46 | well reducing been collection lead to an increase in vermin is never |
---|
0:06:50 | explicitly mention as such as an issue |
---|
0:06:52 | well the people this should be in on either side of it and the readers |
---|
0:06:55 | left to |
---|
0:06:57 | infer the fact whether argue there is this issue will reducing the intellectually to an |
---|
0:07:01 | increase environment |
---|
0:07:04 | so i again as i mentioned while comments are primarily argumentative of course or other |
---|
0:07:09 | things as well for instance to macy clarification about facts and the may provide background |
---|
0:07:15 | as the |
---|
0:07:17 | speakers mention of course they strictly |
---|
0:07:19 | include a jokes or |
---|
0:07:20 | sarcasm one from another of motion often these other things are really |
---|
0:07:25 | they're in the service of some |
---|
0:07:27 | addressing some viewpoint to taking a stand on a particular issue |
---|
0:07:31 | so sarcasm automotive terms which are currently this barry been collection argument things like a |
---|
0:07:36 | lame brained in crazy and some come along indicate commented stance as well as their |
---|
0:07:41 | commercial added |
---|
0:07:44 | okay so given that these things a primary argument is |
---|
0:07:47 | we also i a useful sort summary would be a generic of informative summary that |
---|
0:07:52 | attempted to give an overview of the arguments in the commons |
---|
0:07:56 | and when we were selected on that and discussed at some length seems that the |
---|
0:07:59 | key thing we wanted and sort of overview summary |
---|
0:08:03 | but we then find articulate the main issues in the comment that is the questions |
---|
0:08:07 | of things that people are working about their taking signs on |
---|
0:08:09 | and to characterize the opinion on the main issue so |
---|
0:08:13 | identifying alternative viewpoints indicating grounds given support viewpoints |
---|
0:08:17 | aggregating so cross of the same opinions expressed multiple times what proportion of them |
---|
0:08:22 | the comment is around one side or another of arguments |
---|
0:08:25 | and indicating whether there's consensus or disagreement looks comments |
---|
0:08:29 | we then i put this proposal for among several other proposals for |
---|
0:08:35 | and summary times and two |
---|
0:08:40 | a set of you know sort of respondents without question i would not very positive |
---|
0:08:44 | feedback on this on the summary type of these responses include not just |
---|
0:08:48 | authors and readers of your common journalists and use that is as well |
---|
0:08:52 | and so the based on that we developed a set of guidelines for authoring this |
---|
0:08:55 | summaries |
---|
0:08:57 | and |
---|
0:08:58 | we try not to make them to |
---|
0:09:00 | prescriptive in the sense of we'd give someone theory of argumentation so you must build |
---|
0:09:04 | a summary in accordance with this their ear other |
---|
0:09:07 | we told them about we can introduce these ideas of |
---|
0:09:09 | identifying issues in characterizing opinion and then not them |
---|
0:09:13 | more or less follow their news data that one is to what into we don't |
---|
0:09:17 | like the best way to summarize |
---|
0:09:20 | okay so on to be the method then |
---|
0:09:22 | so as you if you've audible a already |
---|
0:09:25 | since i started speaking or if you set m studios |
---|
0:09:28 | you realise very quickly the writing summaries of large numbers of reader comments is very |
---|
0:09:31 | hard |
---|
0:09:32 | so we first started this problem we had no idea how we go about it |
---|
0:09:35 | and we put set and read |
---|
0:09:37 | a hundred to any comments and thought |
---|
0:09:38 | unlike what happened we summarize this |
---|
0:09:41 | so it's clear you need to break it down and some multiple stage processing is |
---|
0:09:44 | able to tools to support process |
---|
0:09:46 | and that's we've done |
---|
0:09:48 | since we're gonna down to four stage process the really only the first three stages |
---|
0:09:52 | have to do summary |
---|
0:09:53 | offering |
---|
0:09:54 | and the last stages something extra which will come back to |
---|
0:09:57 | so the first stage is what we call a common labeling as on the stage |
---|
0:10:02 | of all annotators go through the conversation comment by comment and write a brief label |
---|
0:10:07 | or |
---|
0:10:08 | you like many summary which tries to capture the |
---|
0:10:10 | essential domain central point the person's making in that common |
---|
0:10:15 | and seven to some additional things or someone read three can with improve the what |
---|
0:10:19 | else annotators rested over there are few examples up arrow the top l one of |
---|
0:10:23 | paradoxes of this is that these things are there are also has to bear in |
---|
0:10:26 | mind they may look at these and context later |
---|
0:10:30 | and so we need right enough that they can understand without having to go back |
---|
0:10:32 | and look at the whole rather conversations in some cases |
---|
0:10:37 | anaphora will be expanded the weather in the label making the label paradoxically longer than |
---|
0:10:41 | the comment |
---|
0:10:41 | at this lesson to be looked at that actually a independently later on |
---|
0:10:46 | so that the and then this is the interface we don't for this was function |
---|
0:10:50 | is to parse the left and green circle that |
---|
0:10:53 | is pretty populated from the conversation automatically and then the annotators distill and their labels |
---|
0:10:59 | on the right ears as a conversation about |
---|
0:11:03 | network rail doing fine for like running trains in the u k |
---|
0:11:06 | and various right cheaper than writing short a labels like |
---|
0:11:10 | and that for real ticket prices the comments applying would seem high |
---|
0:11:14 | some not saying that were rounded mozart's fares are |
---|
0:11:18 | or operate trains |
---|
0:11:20 | and so on these are summaries of a common so you see them but must |
---|
0:11:23 | is as the much or |
---|
0:11:25 | second stage then is to improve these labels together and topically |
---|
0:11:30 | okay so annotators to group |
---|
0:11:32 | written together |
---|
0:11:33 | i placed by putting those we just similar rate of the same group |
---|
0:11:37 | but then assign a group label that describe the common theme the group |
---|
0:11:41 | and we allow them one level of all subgrouping |
---|
0:11:44 | and since some people particular found much easier for a good |
---|
0:11:48 | wrongly group things and then as conversation able to realise is a more structured element |
---|
0:11:52 | word subgroups things a bit but we didn't want them to be arbitrary |
---|
0:11:55 | the subgrouping |
---|
0:11:58 | and so these but going through the sections of the grouping then |
---|
0:12:02 | allows the annotators be better place to make sense of the row constant |
---|
0:12:05 | the comments before they come to writing a summary |
---|
0:12:07 | and again there's a and interface looks like this |
---|
0:12:10 | and so first they just get all the all the labels and then they connect |
---|
0:12:13 | groups by pressing a button to add a new group in a group label |
---|
0:12:16 | so you end of something we got a group label them they |
---|
0:12:19 | they don't the labels or many summaries which the comments underneath is i don't the |
---|
0:12:24 | next group so one |
---|
0:12:26 | the annotators can go back to the previous screen of older comments on the full |
---|
0:12:31 | text as they if they wished as well |
---|
0:12:34 | the first baseline is generating the summary |
---|
0:12:37 | so we asked annotators try to summaries one which is to do first is an |
---|
0:12:41 | unconstrained one or several don't worry about the airlines too much |
---|
0:12:44 | just try to summary |
---|
0:12:46 | and then the second one is constrained where we said no more than the last |
---|
0:12:49 | and hundred and fifty no more than two hundred fifty words |
---|
0:12:52 | and they do that with the first thing constraint summary available as we have reference |
---|
0:12:57 | so |
---|
0:12:59 | further analysis obviously takes place as the annotators go through that stage |
---|
0:13:03 | and may have developed a group label for their and turning it into a summary |
---|
0:13:08 | and sentences and right and so |
---|
0:13:10 | we encourage annotators to use phrases like |
---|
0:13:12 | many several few common to serve basque |
---|
0:13:15 | opinion was divided on the consensus was someone |
---|
0:13:19 | to try to capture the integration or to extract over a number of separate comments |
---|
0:13:26 | so again there's interface for this on the right sort of the left and the |
---|
0:13:30 | green circle you see the previous stage to stage to it but with the working |
---|
0:13:33 | on the right |
---|
0:13:34 | they offer the summary with a |
---|
0:13:36 | word attention right of the boredom which dynamically changes to the right it's like can |
---|
0:13:41 | see how long they summary |
---|
0:13:43 | okay so that completes the sum rewriting and four stages of backtracking stage where which |
---|
0:13:48 | isn't strictly necessary creating the summaries was very useful |
---|
0:13:52 | as resource and for further that's |
---|
0:13:56 | algorithm phones you see later so we asked the authors and select the sentence length |
---|
0:14:01 | the sentences and the constraint like summary |
---|
0:14:03 | two or more groups that form the creation of that sentence |
---|
0:14:07 | okay so really i think some large groups of labels but since the labels themselves |
---|
0:14:12 | have an associated |
---|
0:14:13 | comment id we can actually link directly back from the summary sentences to the source |
---|
0:14:17 | comments that support of them |
---|
0:14:20 | and there's interface again look at a detail here were effectively each summary sentence is |
---|
0:14:25 | presented at all |
---|
0:14:27 | and then the |
---|
0:14:28 | and annotated can select |
---|
0:14:30 | which of the grooves inform the construction of that sounds all that's recorded |
---|
0:14:36 | okay so coming onto the corpus |
---|
0:14:38 | so they were |
---|
0:14:39 | fifteen annotators who carried it summary writing task mostly |
---|
0:14:45 | finally a german some stains grice's of expertise and language and writing in academics |
---|
0:14:50 | and majority were native english speakers this they all have a for english writing skills |
---|
0:14:55 | how to get which given a training session and their guidelines produced as well |
---|
0:14:59 | and the data source ones |
---|
0:15:02 | about three on staff thousand guardian articles of social common sense published in joining gyms |
---|
0:15:07 | it doesn't fourteen |
---|
0:15:08 | then we select a small subset of that |
---|
0:15:11 | in fact eighteen articles |
---|
0:15:14 | in these domains listed here also export health et cetera |
---|
0:15:18 | huh from each of these with like to approximate the first hundred comments from each |
---|
0:15:21 | full common set |
---|
0:15:23 | that is more detail of precisely how this is done on paper |
---|
0:15:26 | so you see it's army of the kind for we iterate underlining corpus that top |
---|
0:15:31 | in terms of your article length complex and so one and so forth |
---|
0:15:35 | but overall me this there's eighteen articles but |
---|
0:15:38 | full of |
---|
0:15:40 | the number of |
---|
0:15:41 | common set total comments is close to seven files and almost ninety thousand words in |
---|
0:15:46 | total |
---|
0:15:48 | i don't see annotation characteristics so |
---|
0:15:51 | is at articles and a plus common sense of them fifteen were doubly annotated three |
---|
0:15:56 | we're triple annotated |
---|
0:15:57 | and it is even with the tools you can see the annotators to three and |
---|
0:16:00 | have to six hours to complete the task for one article plus comments |
---|
0:16:05 | so this is a non trivial undertaking idea |
---|
0:16:07 | anchorage it right without some serious |
---|
0:16:10 | commitment |
---|
0:16:11 | but we replace of the results at their they thirty nine in each of these |
---|
0:16:15 | thirteen annotations assisting summaries |
---|
0:16:18 | each the summaries and so startling to one or more groups comments so all of |
---|
0:16:21 | this is in the corpus which is now available for down |
---|
0:16:26 | and i gonna some statistics don't for the paper which why we're going to in |
---|
0:16:29 | detail about the numbers here of annotations so |
---|
0:16:35 | they just a bit of qualitative analysis of the quantitative analysis |
---|
0:16:40 | before it turned related work in conclusion so and looking over the one slot striking |
---|
0:16:45 | things as the people group things |
---|
0:16:48 | in different sorts of way is particularly they i guess this is the famous a |
---|
0:16:52 | lumber is first displayed here is that we're finding |
---|
0:16:55 | and so on average there was something like nine |
---|
0:16:57 | across the whole annotation |
---|
0:17:01 | all annotations the average number of groups for annotations that was nine range from four |
---|
0:17:06 | not able to fourteen point five |
---|
0:17:08 | for some braves the average pronunciation set is five |
---|
0:17:12 | so most annotators use the subgroup option at least once |
---|
0:17:16 | and but in fact there's quite a divide between those who use the same rules |
---|
0:17:20 | quite frequently and those you only used rolled are rarely |
---|
0:17:26 | and so pleadingly from are from the source back without initially for the |
---|
0:17:31 | a target summaries all of them contain |
---|
0:17:34 | sense reporting views on different views on issues |
---|
0:17:37 | and they frequently picked a points of contention |
---|
0:17:40 | a provided examples of the reasons people gain support of viewpoints |
---|
0:17:44 | they frequently indicated proportionate amount of |
---|
0:17:47 | of commenters talking about the views and so |
---|
0:17:50 | a so the whereas we think the mlp what we one of them to do |
---|
0:17:54 | quite well |
---|
0:17:55 | a couple of examples here is a coded this one with |
---|
0:17:58 | red highlighting the comments that are |
---|
0:18:01 | expressing sort of aggregation |
---|
0:18:03 | and a green identifying some of the issues that more explicitly stated in the summaries |
---|
0:18:09 | i've got another one about skip over the |
---|
0:18:11 | so quite healthy looking |
---|
0:18:14 | summaries the sort |
---|
0:18:16 | i and we show these |
---|
0:18:17 | that's if a common so we |
---|
0:18:19 | we actually showed used to various people in particular the guardian themselves and they were |
---|
0:18:24 | very impressed if you could do this automatically now |
---|
0:18:26 | we be very happy |
---|
0:18:29 | so we also that quickly looked at the |
---|
0:18:34 | try to this determine how similar the summaries were used in this not the sort |
---|
0:18:38 | that used in back and two thousand one |
---|
0:18:40 | where you compare the contrary see what for you look for each sentence in us |
---|
0:18:45 | a summary a to see whether all its contents covered in summary a and then |
---|
0:18:48 | you do the and then you the reverse |
---|
0:18:51 | using a sort of likert scale system |
---|
0:18:53 | to see how what commonality is |
---|
0:18:55 | and as a running a timeout skipped is very quickly but |
---|
0:18:58 | essentially we determine there is affirmative |
---|
0:19:01 | of that's in the summers are quite similar you're not there is a problem with |
---|
0:19:05 | or not |
---|
0:19:07 | i in a one extracts different reference summaries |
---|
0:19:10 | they are relatively similar there is a high level of agreement between the judges and |
---|
0:19:13 | making the judgement similarity |
---|
0:19:16 | what i've only got very short time lasso |
---|
0:19:19 | a bias the really work is cover the |
---|
0:19:22 | and the in the paper is to say a high-level think of three sorts of |
---|
0:19:26 | things |
---|
0:19:27 | a sentence assessment which is a approach to than others of user building resource that's |
---|
0:19:31 | for evaluating extractive summaries |
---|
0:19:34 | i don't from real of |
---|
0:19:36 | reader comments which we used |
---|
0:19:38 | necessarily i think is the one way to gel essentially |
---|
0:19:42 | work on the any corpus which but a detailed comparison here but read that in |
---|
0:19:46 | the paper |
---|
0:19:47 | essentially what we do similar so that the different several key ways perhaps |
---|
0:19:51 | and most importantly that they're summarizing meeting reports in which are much more |
---|
0:19:56 | there are a fixed domain and you can anticipate the sorts of things they're gonna |
---|
0:20:00 | immersion a meeting where is you can't and reader comments |
---|
0:20:03 | and finally |
---|
0:20:04 | some work by misread well on summarizing arguments in |
---|
0:20:08 | across conversations but where the focus of the work is really on try to summarise |
---|
0:20:12 | an argument |
---|
0:20:14 | so it's something like gun control or |
---|
0:20:16 | i gay marriage across a whole set of different online conversations rather trials summarize all |
---|
0:20:21 | v |
---|
0:20:22 | all comments and single conversation which may be able to different topics |
---|
0:20:27 | so distinctly then we've developed |
---|
0:20:30 | we proposed a of all over the summer that captures key content |
---|
0:20:34 | of these was able to multiparty argue but oriented conversations developed a method how humans |
---|
0:20:39 | also such things |
---|
0:20:41 | and used a method of the first publicly available corpus of reader comments probably annotated |
---|
0:20:45 | summaries another information |
---|
0:20:47 | we think summaries produced a pretty good with that achieved a comment |
---|
0:20:52 | and we also use the already been able to use the corpus for whole sets |
---|
0:20:55 | of things for instance reviews the grouping to evaluate clustering algorithms |
---|
0:20:59 | we use the back things top and form a unsupervised cluster late a cause for |
---|
0:21:03 | labeling algorithm |
---|
0:21:05 | and we've done a |
---|
0:21:07 | i use the summaries to inform assessors entire space system evaluation |
---|
0:21:11 | and just very quickly future work well obviously the corpus is limited size would like |
---|
0:21:16 | to make it bigger |
---|
0:21:17 | scalability we still have to prove that scales a two thousand comments from say a |
---|
0:21:22 | hundred |
---|
0:21:23 | we think it well but that's just think we'd have to we have to investigate |
---|
0:21:27 | this and also we like to see whether we can think about some ways of |
---|
0:21:30 | maybe crowdsourcing smaller amounts of the sampling altogether |
---|
0:21:35 | as more questions today would groups and subgroups and finally there's evaluation how do you |
---|
0:21:40 | evaluate against these things |
---|
0:21:42 | why last point so is relation appropriate method |
---|
0:21:45 | that is to be investigated if not how what we do it |
---|
0:21:50 | so this to finish would like to acknowledge then the european community for funding this |
---|
0:21:54 | work under the |
---|
0:21:55 | sensei project guardian for lattice use the materials and redistributed |
---|
0:21:59 | are annotated for hard work reviewers here for helpful comments |
---|
0:22:04 | that a questions that if you would like to download the corpus is available |
---|
0:22:20 | yes and the back |
---|
0:22:45 | well |
---|
0:22:48 | if you have so |
---|
0:22:50 | we have a system that's get an interesting question thank you we have a |
---|
0:22:54 | which will system that the does clustering with the several clustering all those including lda |
---|
0:22:58 | and we put all the all the comments in particular clusters together |
---|
0:23:02 | and people look at the clusters and we usually say |
---|
0:23:05 | and |
---|
0:23:06 | another where is the clusters then that some of the argument of the structure is |
---|
0:23:09 | lost and people actually don't like having these clusters but in front of these and |
---|
0:23:13 | users that they want to go back to see the visual context "'cause" they can |
---|
0:23:16 | really only makes sense of the comments |
---|
0:23:19 | in the dialogic context where there is an argument for it again this don't make |
---|
0:23:22 | sense pulled out on the road are clustered together |
---|
0:23:25 | so it's an interesting idea but i don't think it's gonna help people speed up |
---|
0:23:29 | and doing the task i think they |
---|
0:23:31 | i need to do the grouping on their to be intra one idea you comments |
---|
0:23:35 | just |
---|
0:23:36 | maybe think of the be interesting to see the extent to which the |
---|
0:23:40 | well we had done formal evaluation using the standard sort of |
---|
0:23:45 | pages for evaluating clustering of the machine gender clusters in one's of the scores are |
---|
0:23:50 | up to get a good will be more interesting to see use actually to do |
---|
0:23:54 | something analysis on that a look at how |
---|
0:23:56 | the sorts of things that are that the |
---|
0:23:59 | algorithms putting into the clusters that humans are excluding so but |
---|
0:24:02 | essentially i don't think what happens in summary writing |
---|
0:24:05 | that it could help in |
---|
0:24:08 | obviously an algorithm development which is also important |
---|
0:24:31 | i think is that there is |
---|
0:24:33 | the sre some record a question what think we're hugger what it was or the |
---|
0:24:36 | suggestion was that we think about |
---|
0:24:39 | i guess is a sort of active learning approach or something like this where the |
---|
0:24:42 | system you annotate something the system uses the time at a more common somehow hopefully |
---|
0:24:46 | speed up the annotation is that correct |
---|
0:24:48 | so we don't like |
---|
0:24:49 | so it is good idea we have followed by doing things like that |
---|
0:24:52 | but we have no contrast trying women's in practice to see how what i really |
---|
0:24:55 | work thanks |
---|
0:25:01 | then |
---|
0:25:43 | which ones |
---|
0:25:47 | of the so this is a and after the fact that were after-the-fact assessment of |
---|
0:25:53 | what was going on |
---|
0:25:54 | it wasn't called think of the summary creation this was |
---|
0:26:32 | well we're where we want |
---|
0:26:36 | we well we don't have to i mean we |
---|
0:26:39 | with the results actually has multiple different reference summaries the way a lot of reference |
---|
0:26:43 | summaries sensor data |
---|
0:26:44 | and then we just came back afterwards and so that's better of interest has similarities |
---|
0:26:48 | to each other |
---|
0:26:50 | so it's not hard to produce in the resource that we did that stuff that's |
---|
0:26:53 | actually part of analysing it afterwards to see the extent to which |
---|
0:26:56 | these things are similar |
---|
0:27:12 | yes |
---|
0:27:13 | so it's like |
---|
0:27:14 | so i guess we could then |
---|
0:27:16 | it's also like what people call sort of reconciliation we have multiple annotators do some |
---|
0:27:21 | you try to progress that the proposed a single gold standard |
---|
0:27:24 | so we couldn't act do another stage now |
---|
0:27:26 | then for each this multiple things and do the reconciliation and come up and say |
---|
0:27:30 | well this is |
---|
0:27:31 | i this is the reconcile set the perfect summary if you like of the set |
---|
0:27:36 | of |
---|
0:27:39 | yes i like permit i guess a sort from a larger no |
---|
0:27:42 | i got is i mean |
---|
0:27:43 | actually we wanna resorts to do this space but there's lots more you could do |
---|
0:27:48 | in fact that somebody want to do that on top of what we're releasing that |
---|
0:27:50 | will be great |
---|
0:27:52 | i wonder |
---|
0:27:55 | okay this like robert |
---|