and well you have to name cream right
and this is another resources a paper like the last one that describes how we
go a corpus of human authored a reference summaries for we'd a comma conversations online
news so start off with a this work a fictional pitch
so i think most is no or where a reader comments there you know wide
range of online news sources some of which are shown down the right
and these as a multi way conversations
and they got lots of in information of potential buyers lot of rubbish as well
as a lot of information of facial value to a range of users including
one just people typically reading but people posting comments as well as reading to journalists
and use that is maybe that and was and so on
however i'm sure you've noticed this if you looked at these major problem is of
the news article may quickly tract hundreds even thousands of comments
if you readers a have the patience to wait three this much
so i just estimation seems to be going to be last week automatic summarize these
languages to gain some of overview of what's going on in this conversation
cases but if i were don't known as already and divided the approach is up
and sort of two broad categories what you michael
technology different approaches that is
let's try what we already know how to do and see how well it works
so the idea is well i we without topic to cluster things so that's cluster
all these comments topically using something like lda
and then that's rank them using some sort of ranking algorithm goes ranked clusters rank
senses and the clusters and levels build an extractive summary
from the results
and subsets of this we now to do when it generates a so-called summaries but
in fact if you look at them
i don't for a good summaries their
they fail to capture the argument oriented nature of the
a couple of the conversations pretty spectacularly
the set of approaches which haven't really come to fruition that's are promising our might
be called argument very different approaches those lot of work are given meeting social media
this results in various schemes
defining argument elements and relations is an argument of discourse and of such elements relations
could in fact be detected in is these comments
and they might form a basis for building a summary and the number of people
working in this area have cited summarization is about a motivation for their work
our noses yet proposed have given analysis of sort they
a person with there is one code actually
i drive a summary from the full reader comments that
a well what we talk when we started this project that so the sensei project
on which this work is based
that's really what was need it is a an answer to the underlying fundamental question
watches a summary of reader comments you like
and also be helpful if we had seen him a generated exemplars
for a real sets reader comments
and this would allow us to both a better select appropriate technologies reader comments summarization
and also
to evaluate and develop are systems using these materials
okay so that's gone interaction hands be structure for the talk so
and the next talk a look at those watches a summary of reader comments like
all then talk about a method that we developed for building or authoring reader comments
summaries
and talk about the corpus we don't
some comments on related work in search time and conclusions and future work
well what should the summary of reader comments be like well i think one can
start from i think some remarks made by karen's barge ins that's a really what
a summary should be like depends on the nature of the common sense that one
wants to summarize of the used to which these summaries to be part
so if we look at reader comments to say what characteristics almost must you think
one that is common sets are typically organized into threads based on reply to structure
every comment falls into exactly
one thread another initiates a nice red or
replies to exactly one comment earlier in the thread
as a consequence that these conversations have the formal character a set of trees
after an initial combat is really three separate rate
now we have a comments or other intermediate or leaf nodes whose parrot is the
comments which the reply
now you might not have naively think that these threads are gonna be talking to
a cohesive
in practice they rarely are the same topic may be addressed across multiple threads
actual conversation get long as people don't bother reading what's going before so they start
off the same thing again
and a single thread major from one topic onto another so there there's as many
relation between
threads and topics
so he's quite example this is a big are indian a must for our data
sources the guard in this paper in the high
it's only hotly debated issue of
when the very town councils council northern england decided to reduce
robustness garbage collection see once every three weeks ones are rather than once every two
weeks
as you can imagine that sparked after all right
and there are a course you compositions the original
articles appear in the guardian a quick summary of the article the top followed by
the detail
and then the common starts on these are
how well it sort of like this so starts off or something
i can see how would attract
right so another environment
i know some difficult decisions had it may with cost funding but this seems like
very poorly funded idea
and then someone replies
only people use compost bins and have no trouble with route score or foxes
and so i don't roles and like this
so our observation having looked a lot
very many of these
as a reader comments are primarily then or exclusively a comeback this argumentative the nature
i was readers making
assertions that either express a viewpoint or stances some college or on an issue
raising the original article or by an earlier comment
or providing evidence or grounds for believing if you want or assertion it's already been
expressed
so in the approach and with are developed a theoretical framework which reported in a
a paper wrist argument mining workshop and in berlin a so it works well
issues at the frame was based on the notion of issue where issues a question
on which all of you have you points are possible so for instance shouldn't collect
should be produced once every three weeks
which is a binary alternatives
i didn't be binary that they can be an open-ended as well like
been initially something what was the best from two thousand and fifteen
else is worth noting the issues are often implies that is the not directly expressed
in the comments
and so for instance this issue which unfolds on the common set it is referred
to
well reducing been collection lead to an increase in vermin is never
explicitly mention as such as an issue
well the people this should be in on either side of it and the readers
left to
infer the fact whether argue there is this issue will reducing the intellectually to an
increase environment
so i again as i mentioned while comments are primarily argumentative of course or other
things as well for instance to macy clarification about facts and the may provide background
as the
speakers mention of course they strictly
include a jokes or
sarcasm one from another of motion often these other things are really
they're in the service of some
addressing some viewpoint to taking a stand on a particular issue
so sarcasm automotive terms which are currently this barry been collection argument things like a
lame brained in crazy and some come along indicate commented stance as well as their
commercial added
okay so given that these things a primary argument is
we also i a useful sort summary would be a generic of informative summary that
attempted to give an overview of the arguments in the commons
and when we were selected on that and discussed at some length seems that the
key thing we wanted and sort of overview summary
but we then find articulate the main issues in the comment that is the questions
of things that people are working about their taking signs on
and to characterize the opinion on the main issue so
identifying alternative viewpoints indicating grounds given support viewpoints
aggregating so cross of the same opinions expressed multiple times what proportion of them
the comment is around one side or another of arguments
and indicating whether there's consensus or disagreement looks comments
we then i put this proposal for among several other proposals for
and summary times and two
a set of you know sort of respondents without question i would not very positive
feedback on this on the summary type of these responses include not just
authors and readers of your common journalists and use that is as well
and so the based on that we developed a set of guidelines for authoring this
summaries
and
we try not to make them to
prescriptive in the sense of we'd give someone theory of argumentation so you must build
a summary in accordance with this their ear other
we told them about we can introduce these ideas of
identifying issues in characterizing opinion and then not them
more or less follow their news data that one is to what into we don't
like the best way to summarize
okay so on to be the method then
so as you if you've audible a already
since i started speaking or if you set m studios
you realise very quickly the writing summaries of large numbers of reader comments is very
hard
so we first started this problem we had no idea how we go about it
and we put set and read
a hundred to any comments and thought
unlike what happened we summarize this
so it's clear you need to break it down and some multiple stage processing is
able to tools to support process
and that's we've done
since we're gonna down to four stage process the really only the first three stages
have to do summary
offering
and the last stages something extra which will come back to
so the first stage is what we call a common labeling as on the stage
of all annotators go through the conversation comment by comment and write a brief label
or
you like many summary which tries to capture the
essential domain central point the person's making in that common
and seven to some additional things or someone read three can with improve the what
else annotators rested over there are few examples up arrow the top l one of
paradoxes of this is that these things are there are also has to bear in
mind they may look at these and context later
and so we need right enough that they can understand without having to go back
and look at the whole rather conversations in some cases
anaphora will be expanded the weather in the label making the label paradoxically longer than
the comment
at this lesson to be looked at that actually a independently later on
so that the and then this is the interface we don't for this was function
is to parse the left and green circle that
is pretty populated from the conversation automatically and then the annotators distill and their labels
on the right ears as a conversation about
network rail doing fine for like running trains in the u k
and various right cheaper than writing short a labels like
and that for real ticket prices the comments applying would seem high
some not saying that were rounded mozart's fares are
or operate trains
and so on these are summaries of a common so you see them but must
is as the much or
second stage then is to improve these labels together and topically
okay so annotators to group
written together
i placed by putting those we just similar rate of the same group
but then assign a group label that describe the common theme the group
and we allow them one level of all subgrouping
and since some people particular found much easier for a good
wrongly group things and then as conversation able to realise is a more structured element
word subgroups things a bit but we didn't want them to be arbitrary
the subgrouping
and so these but going through the sections of the grouping then
allows the annotators be better place to make sense of the row constant
the comments before they come to writing a summary
and again there's a and interface looks like this
and so first they just get all the all the labels and then they connect
groups by pressing a button to add a new group in a group label
so you end of something we got a group label them they
they don't the labels or many summaries which the comments underneath is i don't the
next group so one
the annotators can go back to the previous screen of older comments on the full
text as they if they wished as well
the first baseline is generating the summary
so we asked annotators try to summaries one which is to do first is an
unconstrained one or several don't worry about the airlines too much
just try to summary
and then the second one is constrained where we said no more than the last
and hundred and fifty no more than two hundred fifty words
and they do that with the first thing constraint summary available as we have reference
so
further analysis obviously takes place as the annotators go through that stage
and may have developed a group label for their and turning it into a summary
and sentences and right and so
we encourage annotators to use phrases like
many several few common to serve basque
opinion was divided on the consensus was someone
to try to capture the integration or to extract over a number of separate comments
so again there's interface for this on the right sort of the left and the
green circle you see the previous stage to stage to it but with the working
on the right
they offer the summary with a
word attention right of the boredom which dynamically changes to the right it's like can
see how long they summary
okay so that completes the sum rewriting and four stages of backtracking stage where which
isn't strictly necessary creating the summaries was very useful
as resource and for further that's
algorithm phones you see later so we asked the authors and select the sentence length
the sentences and the constraint like summary
two or more groups that form the creation of that sentence
okay so really i think some large groups of labels but since the labels themselves
have an associated
comment id we can actually link directly back from the summary sentences to the source
comments that support of them
and there's interface again look at a detail here were effectively each summary sentence is
presented at all
and then the
and annotated can select
which of the grooves inform the construction of that sounds all that's recorded
okay so coming onto the corpus
so they were
fifteen annotators who carried it summary writing task mostly
finally a german some stains grice's of expertise and language and writing in academics
and majority were native english speakers this they all have a for english writing skills
how to get which given a training session and their guidelines produced as well
and the data source ones
about three on staff thousand guardian articles of social common sense published in joining gyms
it doesn't fourteen
then we select a small subset of that
in fact eighteen articles
in these domains listed here also export health et cetera
huh from each of these with like to approximate the first hundred comments from each
full common set
that is more detail of precisely how this is done on paper
so you see it's army of the kind for we iterate underlining corpus that top
in terms of your article length complex and so one and so forth
but overall me this there's eighteen articles but
full of
the number of
common set total comments is close to seven files and almost ninety thousand words in
total
i don't see annotation characteristics so
is at articles and a plus common sense of them fifteen were doubly annotated three
we're triple annotated
and it is even with the tools you can see the annotators to three and
have to six hours to complete the task for one article plus comments
so this is a non trivial undertaking idea
anchorage it right without some serious
commitment
but we replace of the results at their they thirty nine in each of these
thirteen annotations assisting summaries
each the summaries and so startling to one or more groups comments so all of
this is in the corpus which is now available for down
and i gonna some statistics don't for the paper which why we're going to in
detail about the numbers here of annotations so
they just a bit of qualitative analysis of the quantitative analysis
before it turned related work in conclusion so and looking over the one slot striking
things as the people group things
in different sorts of way is particularly they i guess this is the famous a
lumber is first displayed here is that we're finding
and so on average there was something like nine
across the whole annotation
all annotations the average number of groups for annotations that was nine range from four
not able to fourteen point five
for some braves the average pronunciation set is five
so most annotators use the subgroup option at least once
and but in fact there's quite a divide between those who use the same rules
quite frequently and those you only used rolled are rarely
and so pleadingly from are from the source back without initially for the
a target summaries all of them contain
sense reporting views on different views on issues
and they frequently picked a points of contention
a provided examples of the reasons people gain support of viewpoints
they frequently indicated proportionate amount of
of commenters talking about the views and so
a so the whereas we think the mlp what we one of them to do
quite well
a couple of examples here is a coded this one with
red highlighting the comments that are
expressing sort of aggregation
and a green identifying some of the issues that more explicitly stated in the summaries
i've got another one about skip over the
so quite healthy looking
summaries the sort
i and we show these
that's if a common so we
we actually showed used to various people in particular the guardian themselves and they were
very impressed if you could do this automatically now
we be very happy
so we also that quickly looked at the
try to this determine how similar the summaries were used in this not the sort
that used in back and two thousand one
where you compare the contrary see what for you look for each sentence in us
a summary a to see whether all its contents covered in summary a and then
you do the and then you the reverse
using a sort of likert scale system
to see how what commonality is
and as a running a timeout skipped is very quickly but
essentially we determine there is affirmative
of that's in the summers are quite similar you're not there is a problem with
or not
i in a one extracts different reference summaries
they are relatively similar there is a high level of agreement between the judges and
making the judgement similarity
what i've only got very short time lasso
a bias the really work is cover the
and the in the paper is to say a high-level think of three sorts of
things
a sentence assessment which is a approach to than others of user building resource that's
for evaluating extractive summaries
i don't from real of
reader comments which we used
necessarily i think is the one way to gel essentially
work on the any corpus which but a detailed comparison here but read that in
the paper
essentially what we do similar so that the different several key ways perhaps
and most importantly that they're summarizing meeting reports in which are much more
there are a fixed domain and you can anticipate the sorts of things they're gonna
immersion a meeting where is you can't and reader comments
and finally
some work by misread well on summarizing arguments in
across conversations but where the focus of the work is really on try to summarise
an argument
so it's something like gun control or
i gay marriage across a whole set of different online conversations rather trials summarize all
v
all comments and single conversation which may be able to different topics
so distinctly then we've developed
we proposed a of all over the summer that captures key content
of these was able to multiparty argue but oriented conversations developed a method how humans
also such things
and used a method of the first publicly available corpus of reader comments probably annotated
summaries another information
we think summaries produced a pretty good with that achieved a comment
and we also use the already been able to use the corpus for whole sets
of things for instance reviews the grouping to evaluate clustering algorithms
we use the back things top and form a unsupervised cluster late a cause for
labeling algorithm
and we've done a
i use the summaries to inform assessors entire space system evaluation
and just very quickly future work well obviously the corpus is limited size would like
to make it bigger
scalability we still have to prove that scales a two thousand comments from say a
hundred
we think it well but that's just think we'd have to we have to investigate
this and also we like to see whether we can think about some ways of
maybe crowdsourcing smaller amounts of the sampling altogether
as more questions today would groups and subgroups and finally there's evaluation how do you
evaluate against these things
why last point so is relation appropriate method
that is to be investigated if not how what we do it
so this to finish would like to acknowledge then the european community for funding this
work under the
sensei project guardian for lattice use the materials and redistributed
are annotated for hard work reviewers here for helpful comments
that a questions that if you would like to download the corpus is available
yes and the back
well
if you have so
we have a system that's get an interesting question thank you we have a
which will system that the does clustering with the several clustering all those including lda
and we put all the all the comments in particular clusters together
and people look at the clusters and we usually say
and
another where is the clusters then that some of the argument of the structure is
lost and people actually don't like having these clusters but in front of these and
users that they want to go back to see the visual context "'cause" they can
really only makes sense of the comments
in the dialogic context where there is an argument for it again this don't make
sense pulled out on the road are clustered together
so it's an interesting idea but i don't think it's gonna help people speed up
and doing the task i think they
i need to do the grouping on their to be intra one idea you comments
just
maybe think of the be interesting to see the extent to which the
well we had done formal evaluation using the standard sort of
pages for evaluating clustering of the machine gender clusters in one's of the scores are
up to get a good will be more interesting to see use actually to do
something analysis on that a look at how
the sorts of things that are that the
algorithms putting into the clusters that humans are excluding so but
essentially i don't think what happens in summary writing
that it could help in
obviously an algorithm development which is also important
i think is that there is
the sre some record a question what think we're hugger what it was or the
suggestion was that we think about
i guess is a sort of active learning approach or something like this where the
system you annotate something the system uses the time at a more common somehow hopefully
speed up the annotation is that correct
so we don't like
so it is good idea we have followed by doing things like that
but we have no contrast trying women's in practice to see how what i really
work thanks
then
which ones
of the so this is a and after the fact that were after-the-fact assessment of
what was going on
it wasn't called think of the summary creation this was
well we're where we want
we well we don't have to i mean we
with the results actually has multiple different reference summaries the way a lot of reference
summaries sensor data
and then we just came back afterwards and so that's better of interest has similarities
to each other
so it's not hard to produce in the resource that we did that stuff that's
actually part of analysing it afterwards to see the extent to which
these things are similar
yes
so it's like
so i guess we could then
it's also like what people call sort of reconciliation we have multiple annotators do some
you try to progress that the proposed a single gold standard
so we couldn't act do another stage now
then for each this multiple things and do the reconciliation and come up and say
well this is
i this is the reconcile set the perfect summary if you like of the set
of
yes i like permit i guess a sort from a larger no
i got is i mean
actually we wanna resorts to do this space but there's lots more you could do
in fact that somebody want to do that on top of what we're releasing that
will be great
i wonder
okay this like robert