Speech Transcript - The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News

and well you have to name cream right

and this is another resources a paper like the last one that describes how we

go a corpus of human authored a reference summaries for we'd a comma conversations online

news so start off with a this work a fictional pitch

so i think most is no or where a reader comments there you know wide

range of online news sources some of which are shown down the right

and these as a multi way conversations

and they got lots of in information of potential buyers lot of rubbish as well

as a lot of information of facial value to a range of users including

one just people typically reading but people posting comments as well as reading to journalists

and use that is maybe that and was and so on

however i'm sure you've noticed this if you looked at these major problem is of

the news article may quickly tract hundreds even thousands of comments

if you readers a have the patience to wait three this much

so i just estimation seems to be going to be last week automatic summarize these

languages to gain some of overview of what's going on in this conversation

cases but if i were don't known as already and divided the approach is up

and sort of two broad categories what you michael

technology different approaches that is

let's try what we already know how to do and see how well it works

so the idea is well i we without topic to cluster things so that's cluster

all these comments topically using something like lda

and then that's rank them using some sort of ranking algorithm goes ranked clusters rank

senses and the clusters and levels build an extractive summary

from the results

and subsets of this we now to do when it generates a so-called summaries but

in fact if you look at them

i don't for a good summaries their

they fail to capture the argument oriented nature of the

a couple of the conversations pretty spectacularly

the set of approaches which haven't really come to fruition that's are promising our might

be called argument very different approaches those lot of work are given meeting social media

this results in various schemes

defining argument elements and relations is an argument of discourse and of such elements relations

could in fact be detected in is these comments

and they might form a basis for building a summary and the number of people

working in this area have cited summarization is about a motivation for their work

our noses yet proposed have given analysis of sort they

a person with there is one code actually

i drive a summary from the full reader comments that

a well what we talk when we started this project that so the sensei project

on which this work is based

that's really what was need it is a an answer to the underlying fundamental question

watches a summary of reader comments you like

and also be helpful if we had seen him a generated exemplars

for a real sets reader comments

and this would allow us to both a better select appropriate technologies reader comments summarization

and also

to evaluate and develop are systems using these materials

okay so that's gone interaction hands be structure for the talk so

and the next talk a look at those watches a summary of reader comments like

all then talk about a method that we developed for building or authoring reader comments

summaries

and talk about the corpus we don't

some comments on related work in search time and conclusions and future work

well what should the summary of reader comments be like well i think one can

start from i think some remarks made by karen's barge ins that's a really what

a summary should be like depends on the nature of the common sense that one

wants to summarize of the used to which these summaries to be part

so if we look at reader comments to say what characteristics almost must you think

one that is common sets are typically organized into threads based on reply to structure

every comment falls into exactly

one thread another initiates a nice red or

replies to exactly one comment earlier in the thread

as a consequence that these conversations have the formal character a set of trees

after an initial combat is really three separate rate

now we have a comments or other intermediate or leaf nodes whose parrot is the

comments which the reply

now you might not have naively think that these threads are gonna be talking to

a cohesive

in practice they rarely are the same topic may be addressed across multiple threads

actual conversation get long as people don't bother reading what's going before so they start

off the same thing again

and a single thread major from one topic onto another so there there's as many

relation between

threads and topics

so he's quite example this is a big are indian a must for our data

sources the guard in this paper in the high

it's only hotly debated issue of

when the very town councils council northern england decided to reduce

robustness garbage collection see once every three weeks ones are rather than once every two

weeks

as you can imagine that sparked after all right

and there are a course you compositions the original

articles appear in the guardian a quick summary of the article the top followed by

the detail

and then the common starts on these are

how well it sort of like this so starts off or something

i can see how would attract

right so another environment

i know some difficult decisions had it may with cost funding but this seems like

very poorly funded idea

and then someone replies

only people use compost bins and have no trouble with route score or foxes

and so i don't roles and like this

so our observation having looked a lot

very many of these

as a reader comments are primarily then or exclusively a comeback this argumentative the nature

i was readers making

assertions that either express a viewpoint or stances some college or on an issue

raising the original article or by an earlier comment

or providing evidence or grounds for believing if you want or assertion it's already been

expressed

so in the approach and with are developed a theoretical framework which reported in a

a paper wrist argument mining workshop and in berlin a so it works well

issues at the frame was based on the notion of issue where issues a question

on which all of you have you points are possible so for instance shouldn't collect

should be produced once every three weeks

which is a binary alternatives

i didn't be binary that they can be an open-ended as well like

been initially something what was the best from two thousand and fifteen

else is worth noting the issues are often implies that is the not directly expressed

in the comments

and so for instance this issue which unfolds on the common set it is referred

well reducing been collection lead to an increase in vermin is never

explicitly mention as such as an issue

well the people this should be in on either side of it and the readers

left to

infer the fact whether argue there is this issue will reducing the intellectually to an

increase environment

so i again as i mentioned while comments are primarily argumentative of course or other

things as well for instance to macy clarification about facts and the may provide background

as the

speakers mention of course they strictly

include a jokes or

sarcasm one from another of motion often these other things are really

they're in the service of some

addressing some viewpoint to taking a stand on a particular issue

so sarcasm automotive terms which are currently this barry been collection argument things like a

lame brained in crazy and some come along indicate commented stance as well as their

commercial added

okay so given that these things a primary argument is

we also i a useful sort summary would be a generic of informative summary that

attempted to give an overview of the arguments in the commons

and when we were selected on that and discussed at some length seems that the

key thing we wanted and sort of overview summary

but we then find articulate the main issues in the comment that is the questions

of things that people are working about their taking signs on

and to characterize the opinion on the main issue so

identifying alternative viewpoints indicating grounds given support viewpoints

aggregating so cross of the same opinions expressed multiple times what proportion of them

the comment is around one side or another of arguments

and indicating whether there's consensus or disagreement looks comments

we then i put this proposal for among several other proposals for

and summary times and two

a set of you know sort of respondents without question i would not very positive

feedback on this on the summary type of these responses include not just

authors and readers of your common journalists and use that is as well

and so the based on that we developed a set of guidelines for authoring this

summaries

and

we try not to make them to

prescriptive in the sense of we'd give someone theory of argumentation so you must build

a summary in accordance with this their ear other

we told them about we can introduce these ideas of

identifying issues in characterizing opinion and then not them

more or less follow their news data that one is to what into we don't

like the best way to summarize

okay so on to be the method then

so as you if you've audible a already

since i started speaking or if you set m studios

you realise very quickly the writing summaries of large numbers of reader comments is very

hard

so we first started this problem we had no idea how we go about it

and we put set and read

a hundred to any comments and thought

unlike what happened we summarize this

so it's clear you need to break it down and some multiple stage processing is

able to tools to support process

and that's we've done

since we're gonna down to four stage process the really only the first three stages

have to do summary

offering

and the last stages something extra which will come back to

so the first stage is what we call a common labeling as on the stage

of all annotators go through the conversation comment by comment and write a brief label

you like many summary which tries to capture the

essential domain central point the person's making in that common

and seven to some additional things or someone read three can with improve the what

else annotators rested over there are few examples up arrow the top l one of

paradoxes of this is that these things are there are also has to bear in

mind they may look at these and context later

and so we need right enough that they can understand without having to go back

and look at the whole rather conversations in some cases

anaphora will be expanded the weather in the label making the label paradoxically longer than

the comment

at this lesson to be looked at that actually a independently later on

so that the and then this is the interface we don't for this was function

is to parse the left and green circle that

is pretty populated from the conversation automatically and then the annotators distill and their labels

on the right ears as a conversation about

network rail doing fine for like running trains in the u k

and various right cheaper than writing short a labels like

and that for real ticket prices the comments applying would seem high

some not saying that were rounded mozart's fares are

or operate trains

and so on these are summaries of a common so you see them but must

is as the much or

second stage then is to improve these labels together and topically

okay so annotators to group

written together

i placed by putting those we just similar rate of the same group

but then assign a group label that describe the common theme the group

and we allow them one level of all subgrouping

and since some people particular found much easier for a good

wrongly group things and then as conversation able to realise is a more structured element

word subgroups things a bit but we didn't want them to be arbitrary

the subgrouping

and so these but going through the sections of the grouping then

allows the annotators be better place to make sense of the row constant

the comments before they come to writing a summary

and again there's a and interface looks like this

and so first they just get all the all the labels and then they connect

groups by pressing a button to add a new group in a group label

so you end of something we got a group label them they

they don't the labels or many summaries which the comments underneath is i don't the

next group so one

the annotators can go back to the previous screen of older comments on the full

text as they if they wished as well

the first baseline is generating the summary

so we asked annotators try to summaries one which is to do first is an

unconstrained one or several don't worry about the airlines too much

just try to summary

and then the second one is constrained where we said no more than the last

and hundred and fifty no more than two hundred fifty words

and they do that with the first thing constraint summary available as we have reference

further analysis obviously takes place as the annotators go through that stage

and may have developed a group label for their and turning it into a summary

and sentences and right and so

we encourage annotators to use phrases like

many several few common to serve basque

opinion was divided on the consensus was someone

to try to capture the integration or to extract over a number of separate comments

so again there's interface for this on the right sort of the left and the

green circle you see the previous stage to stage to it but with the working

on the right

they offer the summary with a

word attention right of the boredom which dynamically changes to the right it's like can

see how long they summary

okay so that completes the sum rewriting and four stages of backtracking stage where which

isn't strictly necessary creating the summaries was very useful

as resource and for further that's

algorithm phones you see later so we asked the authors and select the sentence length

the sentences and the constraint like summary

two or more groups that form the creation of that sentence

okay so really i think some large groups of labels but since the labels themselves

have an associated

comment id we can actually link directly back from the summary sentences to the source

comments that support of them

and there's interface again look at a detail here were effectively each summary sentence is

presented at all

and then the

and annotated can select

which of the grooves inform the construction of that sounds all that's recorded

okay so coming onto the corpus

so they were

fifteen annotators who carried it summary writing task mostly

finally a german some stains grice's of expertise and language and writing in academics

and majority were native english speakers this they all have a for english writing skills

how to get which given a training session and their guidelines produced as well

and the data source ones

about three on staff thousand guardian articles of social common sense published in joining gyms

it doesn't fourteen

then we select a small subset of that

in fact eighteen articles

in these domains listed here also export health et cetera

huh from each of these with like to approximate the first hundred comments from each

full common set

that is more detail of precisely how this is done on paper

so you see it's army of the kind for we iterate underlining corpus that top

in terms of your article length complex and so one and so forth

but overall me this there's eighteen articles but

full of

the number of

common set total comments is close to seven files and almost ninety thousand words in

total

i don't see annotation characteristics so

is at articles and a plus common sense of them fifteen were doubly annotated three

we're triple annotated

and it is even with the tools you can see the annotators to three and

have to six hours to complete the task for one article plus comments

so this is a non trivial undertaking idea

anchorage it right without some serious

commitment

but we replace of the results at their they thirty nine in each of these

thirteen annotations assisting summaries

each the summaries and so startling to one or more groups comments so all of

this is in the corpus which is now available for down

and i gonna some statistics don't for the paper which why we're going to in

detail about the numbers here of annotations so

they just a bit of qualitative analysis of the quantitative analysis

before it turned related work in conclusion so and looking over the one slot striking

things as the people group things

in different sorts of way is particularly they i guess this is the famous a

lumber is first displayed here is that we're finding

and so on average there was something like nine

across the whole annotation

all annotations the average number of groups for annotations that was nine range from four

not able to fourteen point five

for some braves the average pronunciation set is five

so most annotators use the subgroup option at least once

and but in fact there's quite a divide between those who use the same rules

quite frequently and those you only used rolled are rarely

and so pleadingly from are from the source back without initially for the

a target summaries all of them contain

sense reporting views on different views on issues

and they frequently picked a points of contention

a provided examples of the reasons people gain support of viewpoints

they frequently indicated proportionate amount of

of commenters talking about the views and so

a so the whereas we think the mlp what we one of them to do

quite well

a couple of examples here is a coded this one with

red highlighting the comments that are

expressing sort of aggregation

and a green identifying some of the issues that more explicitly stated in the summaries

i've got another one about skip over the

so quite healthy looking

summaries the sort

i and we show these

that's if a common so we

we actually showed used to various people in particular the guardian themselves and they were

very impressed if you could do this automatically now

we be very happy

so we also that quickly looked at the

try to this determine how similar the summaries were used in this not the sort

that used in back and two thousand one

where you compare the contrary see what for you look for each sentence in us

a summary a to see whether all its contents covered in summary a and then

you do the and then you the reverse

using a sort of likert scale system

to see how what commonality is

and as a running a timeout skipped is very quickly but

essentially we determine there is affirmative

of that's in the summers are quite similar you're not there is a problem with

or not

i in a one extracts different reference summaries

they are relatively similar there is a high level of agreement between the judges and

making the judgement similarity

what i've only got very short time lasso

a bias the really work is cover the

and the in the paper is to say a high-level think of three sorts of

things

a sentence assessment which is a approach to than others of user building resource that's

for evaluating extractive summaries

i don't from real of

reader comments which we used

necessarily i think is the one way to gel essentially

work on the any corpus which but a detailed comparison here but read that in

the paper

essentially what we do similar so that the different several key ways perhaps

and most importantly that they're summarizing meeting reports in which are much more

there are a fixed domain and you can anticipate the sorts of things they're gonna

immersion a meeting where is you can't and reader comments

and finally

some work by misread well on summarizing arguments in

across conversations but where the focus of the work is really on try to summarise

an argument

so it's something like gun control or

i gay marriage across a whole set of different online conversations rather trials summarize all

all comments and single conversation which may be able to different topics

so distinctly then we've developed

we proposed a of all over the summer that captures key content

of these was able to multiparty argue but oriented conversations developed a method how humans

also such things

and used a method of the first publicly available corpus of reader comments probably annotated

summaries another information

we think summaries produced a pretty good with that achieved a comment

and we also use the already been able to use the corpus for whole sets

of things for instance reviews the grouping to evaluate clustering algorithms

we use the back things top and form a unsupervised cluster late a cause for

labeling algorithm

and we've done a

i use the summaries to inform assessors entire space system evaluation

and just very quickly future work well obviously the corpus is limited size would like

to make it bigger

scalability we still have to prove that scales a two thousand comments from say a

hundred

we think it well but that's just think we'd have to we have to investigate

this and also we like to see whether we can think about some ways of

maybe crowdsourcing smaller amounts of the sampling altogether

as more questions today would groups and subgroups and finally there's evaluation how do you

evaluate against these things

why last point so is relation appropriate method

that is to be investigated if not how what we do it

so this to finish would like to acknowledge then the european community for funding this

work under the

sensei project guardian for lattice use the materials and redistributed

are annotated for hard work reviewers here for helpful comments

that a questions that if you would like to download the corpus is available

yes and the back

well

if you have so

we have a system that's get an interesting question thank you we have a

which will system that the does clustering with the several clustering all those including lda

and we put all the all the comments in particular clusters together

and people look at the clusters and we usually say

and

another where is the clusters then that some of the argument of the structure is

lost and people actually don't like having these clusters but in front of these and

users that they want to go back to see the visual context "'cause" they can

really only makes sense of the comments

in the dialogic context where there is an argument for it again this don't make

sense pulled out on the road are clustered together

so it's an interesting idea but i don't think it's gonna help people speed up

and doing the task i think they

i need to do the grouping on their to be intra one idea you comments

just

maybe think of the be interesting to see the extent to which the

well we had done formal evaluation using the standard sort of

pages for evaluating clustering of the machine gender clusters in one's of the scores are

up to get a good will be more interesting to see use actually to do

something analysis on that a look at how

the sorts of things that are that the

algorithms putting into the clusters that humans are excluding so but

essentially i don't think what happens in summary writing

that it could help in

obviously an algorithm development which is also important

i think is that there is

the sre some record a question what think we're hugger what it was or the

suggestion was that we think about

i guess is a sort of active learning approach or something like this where the

system you annotate something the system uses the time at a more common somehow hopefully

speed up the annotation is that correct

so we don't like

so it is good idea we have followed by doing things like that

but we have no contrast trying women's in practice to see how what i really

work thanks

then

which ones

of the so this is a and after the fact that were after-the-fact assessment of

what was going on

it wasn't called think of the summary creation this was

well we're where we want

we well we don't have to i mean we

with the results actually has multiple different reference summaries the way a lot of reference

summaries sensor data

and then we just came back afterwards and so that's better of interest has similarities

to each other

so it's not hard to produce in the resource that we did that stuff that's

actually part of analysing it afterwards to see the extent to which

these things are similar

yes

so it's like

so i guess we could then

it's also like what people call sort of reconciliation we have multiple annotators do some

you try to progress that the proposed a single gold standard

so we couldn't act do another stage now

then for each this multiple things and do the reconciliation and come up and say

well this is

i this is the reconcile set the perfect summary if you like of the set

yes i like permit i guess a sort from a larger no

i got is i mean

actually we wanna resorts to do this space but there's lots more you could do

in fact that somebody want to do that on top of what we're releasing that

will be great

i wonder

okay this like robert

The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News

Oral Session 2: Corpus creation

Emma Barker, Monica Lestari Paramita, Ahmet Aker, Emina Kurtic, Mark Hepple and Robert Gaizauskas