this is the work of my phd student law may write him or not who
it's from here on
incamera currently leave the united states
so i
presenting our work with your and she's finishing phd not very good situation
alright
so our
and the motivation for this work is that
a narrative structures occur all over different kinds of natural language genres you see "'em"
in restaurant reviews you see it in
in newspapers
and this seems to be because humans
the way that they work advice
the world is in terms of narrative structure so this kind of fits in very
well with the added tailors
i talked this morning that people are always trying to look for coherence a lot
of that coherence can kind of be framed as a narrative structure
finally agree
that narrative understanding requires modeling the goals of the protagonist in tracking the outcomes of
these goals whether the goals are being fulfilled or not
or thwarted
and first person
social media stories are actually full of these expressions of desires and outcome
descriptions of for example here's something from a log site like journal where it's very
similar to like most of where our data comes from
so somebody's telling a story something that happens at a at a concert
slight drop something it was dark about the cellphone hardly look for we spoke a
little bit it was loud and so can really talk
i had hoped to
asking to jointly forgery first subpoena to the shower likert with alarming to do such
thing
but he left before the and alliance em after that maybe outright missed connections
so this
sentence here i had hoped to ask him to jointly for a drink or something
i shows an expression
of a first person desire and one of the reasons that were interested in first
person stories is because
we don't have to deal with co reference it's quite easy to try to the
protagonist is in the first person
narrative so we can kind of tracker
the near the protagonist goals in this case
which makes the problem of a little bit more tractable
so what we do is we identify goal and desire expressions in
first person narratives like for example that had hoped to in the previous we have
a bunch more all
tell you more about how we get an
and then we well we aim to do is to infer from the surrounding texts
whether or not the desires fulfilled or it's not fulfilled so we want to actually
we the narrative and be able to predict whether the desire
is fulfilled or not
so in this particular case the one i showed you
we have this a phrase but he left for the and i didn't seem after
that which clearly indicate that the desire was unfulfilled
and
as i said in this kind corpus that we have so we have a corpus
of about nine hundred thousand
first person
stories from a blogs domain
these
excuse the for you know if i try to do that the practise
so
i this was a slight a lot i didn't have a mare
but there's the these first person narrative are just right
with these desire expressions you can get as many as you want as you could
possibly want out of out of this kind of
data and they have lots and lots of different forms like i wanted to i
wish to i decided to i couldn't wait to
i aim to i arranged to and i and i needed to
and this paper we it's true that states can also expressed desires like
if you say something like i'm hungry
that implies that you have a desire to get something to e
so we initially we had a goal that you we might be able to do
something mistakes but we decide in this paper to restrict ourselves to
to particular verbs
and have tense
expressions
so
okay
so the related work the previous work was
in a around twenty ten was the first paper on this by ellen trial often
hers to deny me well who were trying to implement a computational model of twenty
lenders
plot units for story understanding
and one of the main things that you do in that model plot units is
that you try to
tracking identify the
states of the of the characters
the dataset they use with say stops fables
and they manually annotated a stops tables themselves to examine the different types of aspect
expressions and narratives and one of the things that they claim of this paper is
really interesting paper you have read it one of things that they claim is that
i states are not expressed
explicitly like i was
the character we saturday character was happy but that they're implicated or you data right
the inferences
by the tracking the character schools and they claimed in this
and seminal paper that we need even though it's been a long time ai idea
that what you wanted to extract people's intentions and whether they're being realised or not
that in natural language processing we need to do much more work
on tracking a goals and
and their outcomes
is there is also a recent paper by selecting chatter of at
where she kind of picks up on this idea tracking expressions of desire and their
outcomes
and they did this in a two very different corpora from cars they to date
and m c test which is the corpus from microsoft
of crowd sourced stories that are suitable
for machine meeting a task and the stories are supposed to be understandable by seven
year olds so you get to start expressions like
johnny wanted to be on the baseball scene
he went to the parking practised everyday you know so
that kind of story and then they also to passages from wikipedia
and tracked desires there and then you and you get stuff like lenin wanted to
be varied in moscow
but blah so they they're very different than ours than when i first heard a
presentation of this paper
i thought that aren't at it so much more suitable to this task data we'd
already been working on for several years
a narrative understanding that our data is so much suitable and so much
so prime with this particular task that we had to try it on the our
datasets
so
so we made a new corpus which is publicly available you can download it from
our corpus page
we have three thousand five hundred it's really high quality corpus where i'm super excited
about it being able to do more stuff with that
with three thousand five hundred first person informal narratives with the annotations you can download
it
and we
and in this paper might talk about how we model the goals and desires and
they're gonna talk about some classification models that we've done i don't know why my
slides going on the bottom
thing there
but
what do we do a feature analysis of what features are actually good for predicting
the fulfilment outcome
we look at the effect of both the prior context in the pos context and
i don't even know what that last thing is
on there that we do
this is gonna be a problem
i hope it'll be
can i x
then by slides are going on the bottom of the
ok starting a subset of the spinner corpus which is publicly available corpus of social
media from
collected at one of the i c w s and task
and we restrict ourselves a subset of the spinner corpus that comes from these traditional
kind of journalling side like journal by a
so you can get quite clean data by restricting your style
two
things from the spinner corpus that just come from particular blogs websites
should we use the power
and you
i it works fine on my
you know
for
i guess
it is not one can to right now
we can continue
and you think the pdf would be better
no you can see okay alright
okay
okay
alright
so we have a subset of the spinner corpus
we have this like what we claim is a very systematic method linguistically motivated method
to identify just a wrinkle statements we collect the context before the goal statements in
the context after five up to five utterances perform five utterances after
and then we have we have mechanical turk task where we put it out of
mechanical turk and we collected a gold standard labels for the fulfilment
status to be i actually also ask the turkers to mac
to mark what the spans of text
we're for evidence for fulfilment are not for film that
but it is
paper we don't do anything with the evidence
okay so i kind of refer to this before the many different linguistic waste expressed
desires and so one of the things that my colleague phenomenon was struck by the
prior work with that it was the limited in terms of the desire expressions that
they looked at they just looked at hope to which two and wanted to
i think that's motivated probably by the fact that the and c test corpus is
very simple and written for
seven year olds and it was crowd sourced of maybe they didn't have very many
expressions of desire in there
but our data is open-domain is very rich we have complex sentences complex
temporal expressions we have all kinds of really great stuff for it on there are
really encourage you to have a look at the data
so what are not it was he went through framenet and picked every kind of
frame really thought could possibly have a verb in it that would or state it
would express the desire expression
we went through we made a big list of all those then we looked at
their frequency in the gigaword corpus to see which things a kind of most frequent
english language not just dinars
we pick a thirty seven verbs we constructed their past tense
for patterns with regular expressions
and then we
and then we put those out against arg database of nine hundred thousand first person
stories
and we found six hundred thousand stories that contain verbal patterns of desire
so this is kind of what it looks like we go five sentences before and
five senses after the reason that we go five sentences before is that
there is a oral narrative claim that the structure of narrative that you often for
chateau
something that's gonna happen so unlike the previous work we took the prior context
so we have the prior contact the desired expression and the
and the pos context in our goal is to use the context around the desired
suppression to try to predict whether
the
to express desires that
so we sampled from a corpus according to a skewed distribution that match the whole
original corpus we put three thousand six hundred eighty samples out for annotation
exhibiting sixteen verbal pattern
and we show the mechanical turkers
what the desire expression was that they were supposed to match "'cause" sometimes and story
might have more than one
in it
so we show them the one that they were supposed to predict the fulfilment status
for
we had three qualified workers for utterance
this is really annoying
sorry
we ask him to label whether the desired expression was fulfilled in to mark the
textual evidence
so
we got agreement almost have a mechanical turk we did three qualify the workers the
kind of make sure they could read english and that they would kind of paying
attention to the task that's typically what we do when we have a task like
this is that we
we put it out a lot of people we see who does a good job
and then we
go back to the people that have done a good job and we say will
give you this task exclusively we pay then
well
and then we'll and then go off and do it for whatever it takes like
a week or two
and
we got on the stuff that we put out there we got that there were
a seventy five percent of the
but to start were fulfilled sixty seven percent ground of l and forty one percent
of them
where and don't from the context
if i'm not in presentation the next
i shows okay so
the
one thing to notice is that the verbal pattern itself harold's
the outcome so what how you express the desire is often
kind of conditioned on whether the desired actually fulfilled so if you look at
does decided to
decide it's you kind of
implicates that the desire is gonna be fulfilled
if you use the a word like hope to it implicates that the desire is
not gonna be fulfilled
but there is but something like wanted to
it's more around fifty are needed to so there's a there's a prior distribution that
is associated with the selection of the
of verb form
and so what we have like this database you know like a set you can
download it
we think it's really lovely
testbed remotely desired can personal narrative in their fulfilment
it's very open domain we have the prior in the pos context we have pretty
reliable
annotation so that's one of our contributions is just
a corpus
so accented talk about the experiments we did so we define feature sets motivated by
narrative structure some of these features were motivated by the previous work by
garland right off and by think it is
chatter various experiments
and then we ran another class different kinds of classification experiments
to test whether we can actually predict desire that
fulfilled desire and we also apply our models to chatter babies data which is also
publicly available and so we compare directly
how our models work on their data and are data and all those datasets are
are publicly available
so some of our features come directly from the desire expressions are in this example
eventually i just decided to speak i can't even remember what i said people were
very happy
and proud of me for saying what i wanted to say
so the first
i think that's important is that is the desire for
obviously like whether it's decided to are both sure wanted to
and then what we call the focal word which is the embedded for underneath the
desire expressions so we pick the verb with stem it
so in this case it's p
we then look for other words that are related to the vocal words in the
context that we look for synonyms and antonyms of the vocal words and we count
whether those things occur
we look for the desire subject in its mentions all the different places with the
desire subject which is in our case is always first person
get mentioned
and
and we have those features we have discourse features having to do with whether there
is discourse relations explicitly stated
and classify these according to their occurrence in penn discourse treebank
is that we just there is an inverse indexing penn discourse treebank annotation manual that
gives you all the
all the surface forms that were classified as a particular class of discourse relations that
we just take those from there so we have two classes violated expectation or max
median expectation
and we keep track of those
we have sentiment flow features
i miss anything
we have connotation lexicon
sentiment flow features
so we look and see whether the over the passages that are in the story
whether the sentiment changers stuff it starts positive in goes to negative or starts negative
it goes to positive that's not feature that we keep track a
and we have these four types of features that are motivated by the narrative x
it characteristics the paper goes into detail about the appellation experiments that we do
to test which kinds of features
we use a d
a neural
network architecture that sequential architecture to do this i'm running out of time
and
we also compared to just plain logistic regression
on the data so we have two different approaches for generating the sentence embedding jesus
get caught
the pre-trained skip but models which like we can combine the we concatenate the features
with the with the sentence embeddings and we can use that as the input representation
and we also have a convolutional neural net
recursive neural net
so we have it is three-layer architecture and
what we do this we sequentially go through the prior context in the pos context
we also did experiments where we
distinguish we tell the learner whether the prior context of the pos context surprisingly to
me
it doesn't matter
if you if you
so what we do is we do that have just to remember we have eleven
sentences we have five for five after the desired expression
at each stage we keep the desire expression
in the inputs to each time we meet in a new next
sentence context
and we keep the desired expression and then we kind of recursively call routing get
the next one
so that how we're keeping track of the context
and we did some experiments on a subset of
desired to be which is meant to match more closely the sense that
chattered at worked on
that only have a expressions that she looked at only have two minutes
okay so
we look we wanted to do these ablation experiments in the with these different architectures
and so
there are first thing compared to bag of words with skip well it shows that
have been these linguistic features actually matters
for the performance on this task not just the embedding
so we get an overall f one of point seven for predicting
fulfilment versus not fulfilment
we also have results that show that
are
theoretically motif plane motivated claim that the prior context should matter not just the subsequent
context
this shows this slide shows that indeed
it does improve over just having the desired that's alone to have the prior context
of course if you have the whole context
you can do even better
and then we compare
certain individual features
bag of words versus all features versus just the discourse features than our best result
is that just the discourse features this is actually i in my view is kind
of disappointing that just the discourse features by themselves
to better than all the features
so if you kind of tell it paid just pay attention to these discourse features
and i would consider like that
interesting next thing would
be to do would be to explicitly sample the corpus
so that you
selected stuff that didn't have the discourse features so that you could see
what other features come into play when you don't have explicit discourse features they're
so interestingly it's a similar features and methods achieve better results than the fulfilled classes
compared to the unfulfilled class
and we think that's because it's just harder to predict
unfulfilled case it's more ambiguous and the annotators the human annotators that think problem
and supposed to stop now
those
we actually kind of really surprising to us we got much better results on chat
of eighties dataset then
they did them
cells
okay so i stopped and i will take questions
i will be about that slide
questions please
nobody
right so you see it is if you looking at just verbal patterns for these
designer
expression give you a do you come across
nonverbal patterns that might expose designer or
there are there are not verbal patterns like you can easily see like if somebody
says i
suspected to then you could have my expectation was that
or you could have
but what we
we so we didn't
we did a search against the corpus where we pulled a bunch of those things
out my expectation my goal my plan
and also some of these data is like
hungry thirsty tired whatever that might indicate some kind of goal
and we just decided that to leave those things aside
for the present but there definitely in there
so if you're actually interested in those you could easily probably find those semantic in
things like you know just purpose clauses lot of these contexts are in this that
they don't actually have the words of you know you don't want to go somewhere
you can see in order to go somewhere
are you don't actually get verbal patterns so those the there are
i'm just wondering how many other kinds of
hundreds the might be i just think there is lots of other there's lots of
other patterns and it's actually
what's really interesting is how frequent want is
so in this
so for our data
once it is the most common
of the verbal patterns wanted is the most common expression
and you could do quite a lot if you just look for wanted to assess
it we have all these we have all these different ones and we are also
able to show that they have different biases as to whether the
the goal be fulfilled or not i have not just comments at the end because
usually the kinds of but when you're talking about non fulfilment that's indication of
expectation
and i wouldn't have thought that
the work decided
gave generated that expectation that can get counts other words the time at your this
probably due but not decided
so
but you
sighted
was unfulfilled
sixty some design a design bill eighty seven percent of the time
what that shows right
so we strongly a strong into intuition before we put the data for annotation
it's just and it is fulfilled eighty seven percent of the time which is what
i would expect it's just looking at the and it's unfulfilled nine percent of the
times right
okay
it's interesting they
you could you could see a difference there i had very strong intuition that a
lot of these would be interesting
that they because it would be implicated so i'm actually quite interested in the fact
that i used in ten percent of the time we decided to it's actually not
fulfill
and
and there's these different cases that not fulfilment which we're looking at an art in
our subsequent work
like
sometimes the key goal is not fulfilled because something else comes along just side actually
to do something now
so it's not really unfulfilled is just you kind of changed your mind like
we wanted to buy
playstation
and we went to best buy
and the
we were on sale so we came home with that we
so it mac if the kind of higher level goal
is actually fulfil that they wanted some kind of entertainment system but the expressed desire
was in fact not fulfilled those
those are maybe about i don't know
eight percent of the cases of the ten percent something might not
okay since we're running over time should mix people forced segmentation