in q
my name is that i have i worked with we marker at the natural language
and dialogue systems lab at uses an improved
i'm going to talk about
learning fine grained knowledge about contingent relations between everyday band
or well here in this work is
to capture commonsense knowledge
about the fine grained events are everyday experience and the events that occur in their
everyday life of people
like in a like opening every enable
preparing food in getting to or
an alarm going off
triggers waking up and getting out of that
you believe that this type of knowledge and the relation between
you events is a contingency relation
based on the definition of contingency from the penn discourse treebank which has two types
that cause and condition
another motivation of our work if that much of the user generated content on social
media is provided by ordinary people telling stories about their daily lives and stories are
reach in common sense
knowledge and discontinuing relation between events i have two examples here from our dataset our
dataset is drawn from
is actually a subset of this you know what's which have millions of blog posts
and it contains the personal stories written by people about their daily lives in their
block
in the examples you can see that are sequences of coherent events in the stories
for example the first one
it about going on a camping trip
and you have a sequence of events that
they pack everything they
we got in the morning to go get to the camping ground and
a place that are a set of the pen so there is you sequence of
events in the second story is about witnessing the store
the parking
make landfall the green blue a tree fell
then they people start cleaning up writing of the trees a lot of breaking
there is this commonsense knowledge in this contingency relation between events in stories implicitly
and you want to learn these and
it is showing his talk that use fine-grained knowledge
is not found in the previous work
on the extraction of
narrative events and even collection
a much of the previous work
is not focus on a particularly the relation between the events they characterise what they
learned as a collection no
events that tend to co-occur
and on they are kind of vague about what is the relation between the sequence
of events
is also mostly focused on venues wired longer
so the type of the knowledge that they can learn
is a limited to
the new rd events that are on the news articles like the ball mean really
more explosions
as for evaluation they mostly used an article fast which we believe is not the
right way to evaluate used type of knowledge
so
in our work we focus on contingency relation between events
we use the personal blog stories
at the dataset
so we can learn new types of knowledge about even and other than the newsworthy
events
and we also used to
evaluation metals one of them is inspire and motivated by previous work and the other
one is completely new
this is that a distortion this dataset or tends to be told in chronological order
so there's a temporal order between the events that are told in the story and
this is great because temporal order between events is a strong cue to contingency
so this makes it us to the whole
for our task
but these data sitcoms the with its own challenges
it has more of an informal structure as compared the news article that are well
structured
and the structural
disorders or more similar to the oral narrative in one of our previous studies apply
the oral narrative while of all involves key to
this to label that clauses in his personal stories and we show that about only
a third of the sentences in the personal narratives
you describe the actions and events and
the other two is there are talking about the background and the
try to distaste the emotional of the narrator
i have an example here i'm not going to describe what is labeled are but
you can see that there is some background like
now on we speak story
is that one can take the
or a then there is some a actions and events about the person getting rs
there
by the traffic police and then
at like me to my i should go free so it's not all events there
is a lot of other things going on which makes it more challenging
and
so we need not all methods to
in very useful relations between events from this dataset and i'm going to show it
in the experiments that
if we apply the methods that work on the news articles to extract the event
collections
we won't get good results on this dataset
what events we define event at the brier
we three arguments the subject the direct object and the particle
and some examples
definition is motivated by
one of the previous work by a hotel and nineteen twenty fourteen that they show
that more argument representation argument is richer in it is more capable of capturing the
interaction between events
they use verb and subject and object we also added the particle
because we think that it is also necessary for conveying the right meaning of a
bound for example the first and then a stable
event it putting out that and
and you have put direct object and the particle with all and you can see
how all these arguments
i can contribute to the meaning of the event like put by itself
it has a different meaning that putting up to ten and
also the particle put and put up
are you tell you different thing
and you know or especially it is important because
it's more informally had a lot of each verb still it's important to have all
the arguments
in the event representation
or extracting events we use the stamp for dependency parser and
use the dependency parse trees to extract the use of verbs and arguments
and we also use the sample named entity recognizer
to do a little more generalization of the arguments for example the
terms of the phrases that refer to location
are mapped to their type location the same four percent i'm date and
section so
the contributions of our work is that we have a data collection step we generate
topics sort of personal stories using it would it's tracking algorithm
then be directly compare our method for extracting these contingency relation between events on a
general domain set of stories and also on the topic specific data that we have
generated
and we will show that we can learn more fine grained and richer knowledge and
more interesting knowledge from the topic specific corpus
and a model works
significantly better on the topic specific corpus and this is the first time that you're
doing this comparison directly on these two types of data set
for the event collection
and will show that is improvement is possible even with less amount of data on
the topic specific corpus
we have to use that some experiments we directly compare our work to the most
relevant previous work
and we also used to
evaluation methods for these experiments
no the data collection of our we have a some unsupervised algorithm for generating a
topic specific dataset using a bootstrapping method the corpus you're is the general the on
annotated blocks corpus that has the all the personal blog stories
we first manually label a small set at this feature for the bootstrapping
about two hundred to three hundred and each topic
and that is into a lot of like us we choose a we it
after the learner
so we generate some event utterance
specific to that talking for example if you're looking at the camping trip story
we can generate some pattern like this like and p followed by proposition followed by
and optional and p and the head of the first noun phrases counting the recognition
and so
it generates some even tyrants that or
strongly correlated with the topic
and then we use these patterns to lose track and label automatically label or stories
on the topic from the
corpus so
then we fading the on labeled data
java slot and we use this patterns
and based on how much of the patterns of a topic you can find in
an unlabeled data
in the label each other
that topic so
we do about two to three hundred and two topics we generated about with one
around a bootstrapping you generated about one problem
new label with a bootstrapping
and here i'm presenting the results on two
topics
from our corpus the counting story
and stories about witnessing image or store
a bit about three hundred stories we generated the expanded the corpus about al
or learning called a contingency relation between events we use how the potential method introduced
by anymore in your june two thousand nine
it's an unsupervised distributional measured it measures that
and then the of an event pair to encode a cause relation
you know on it apparently when have a high like cause a potential all swore
they have a higher probability of occurring in
the causal context
so the first component your is the mutual information in the second one
it is taking into account the temporal order between the advanced so
if they can talk or more in this order this particular ordered a we have
a higher recall the potential score
and this is
great for our corpus because the events are the events tend to be told in
the right sample order
then we calculate a called a potential
for every pair of you understand events in the corpus
a using escape to bigram model
because like i shown in example
all the sentences or not
events and events can be interrupted by the non events and that the we use
this you to bigram
which defines two events to be ideas and if they are really in two or
less the events from each other
most of the previous work use this narrative closed test
for evaluating their
sequence of events that they have learned
now suppose that a sequence of narrative events in a document from which one event
has been removed
and the task is to predict the remove event
no we believe that this is not
suitable for our task for evaluating the coherence of events
and also in the previous work by each other and we they show that unigram
model results are nearly as good as or more complicated more sophisticated models
on this task so it's
not good for a capturing the
all the capabilities of the model
no we are proposing in new evaluation model which is motivated by cope all corpora
wasn't evaluation method for the common sense causal reasoning it had to choice questions
so
we are generating are automatically generating used to choice questions from a task that we
have a separate held-out test set for each dataset
and
it would choice question
consist of one event question
event question for example of your a range outdoor
is extracted from the test that still it occurs in the test that
and one of the choices which is the correct answer
is the event that is followed by it is falling the a range so it's
falling the event question
unlike the task that and the second one
it is not the correct answer is randomly generated from the list of all events
that have that
so if you're in that have a range outdoor followed by whole tray
and call it is randomly generated
so the model is supposed to predict which one
of these two choices is more likely to have a contingency relation with the event
in the question and then we calculate the accuracy based on the answers that the
model generates your
in previous work we compared to indirectly it
the work by but is the remaining in grounding thirteen
they generate something that they call the realm gram toppled it basically a pair of
relational toppled of events so they generate these pairs of events
that tend to
collector together
they use the news article
there is that
use the co-occurrence statistics based on symmetric conditional probability
which is here and the cp
so we basically just combines the bigram model in two directions
and on their corpus their demands that they have learned is publicly available you can
access to
the run online search interface
and they show that in the remote that they outperform the previous work on the
thing
talking on learning the narrative events
big and i two experiments to compare these previous work
e compare the content of what hitler
to show that would be learned not exist in the previous collections and we also
applied or model on our dataset to show that the model that more on the
more structured data like news article cannot get good results on how to do that
of the baseline we use the unigram model which basically is the distribution of the
prior probability of the events we use the bigram which is the bigram probability of
the event pair
again using the script a bigram model and the event a cp the symmetric conditional
probability from the real grams work
and i mean method here is that all the potential so
we have two dataset the general domain stories dataset are the stories are randomly selected
from the corpus they don't have a specific theme or topic
we have four thousand stories in the training and two hundred stories in the held
out test set
we also have a topic specific that is that your time
i will be presenting the results on two topics the camping stories and stories about
witnessing the score
here's displayed other dataset for each of topic so we had a hand labeled c
we split into test and training so we have the hand labeled cast you have
the hand labeled training and then we create for each topic
a larger training set that has the hand labeled training
last the bootstrap data to see if the blue strapping is helpful at all or
not
here is the results this is the accuracy on a all the task was true
for each topic
though
i'm reporting the results of the baselines on the largest rings that each other hand
label and it with a strap because
and the hand labeled the results are
just a little worse so i'm just a reporting the
best results for the baseline
and then for causal potentially all have the results for both the hand labeled train
set mutual small about
one or two hundred and
on the largest trends that about the problem
which is the hand labeled plus the word wrap
it up
here you can see that the other potential results are significantly stronger than
all gutter baseline
and also the results on the topic specific dataset is significantly stronger
on the results on the general domain even for the call the controls about
accuracy is pointing by one but on the topic specific on the
even a smaller dataset
you can
and you sixty eight percent accuracy for the for one topic and for another about
eighty eight percent accuracy
and also if you compare the results on the smaller hand-labeled train set to the
training set with the worst wrapping which is larger
they consider more training data collected by bootstraping can improve the results that was tracking
was actually effective
no
event the cp or the bigram models that
were used in the previous work for generating these events a collection
did not work very well on our dataset
the next thing it is we want to compare the content of what we have
them and see if we actually exist in the previous collections are not so you're
i want to show the results of comparing the event that we expected from
the camping trip story
against the realm i'm tuples
so the real grounds are not topics or that
so what we did to get the ones that are related to the campaign is
that we use our top then even tighter that are generated in the blues tracking
process
and we use them to search the interface so
each event pair in the
like this example
for example go camping is one of the even patterns that we have and then
research it in the interface and then we get all the pairs that
there at least one of the event is
go camping
so then apply filtering and ranking that was used in the same paper they filter
based on the frequency at rank based on the symmetric conditional probability metric that they
had
and then evaluate the top and in four hundred
on our next evaluation task that a jury next
and this is some examples of the
there are extracted for the counting from real ground
in you can see that if you look at the second
events in his parents
person a likeable camping and then work we first then we'll reorganisation be direct organisation
lose percent so it seems that this is not about the chanting tree like to
find camping three it's about
mostly the eight moves or the refugee
and
so we propose a new evaluation method on
under the mechanical turk
for evaluating the topic specific contingent event pair
so we evaluate the cars based on their topic relevant and contingency relation
we asked the annotators to
rates the pairs on a scale of zero two three zero is the events are
not contingent to one
events are contingent but not relevant to talk to their can you know but somewhat
relevant topic and three's the strongest the events are doing and stronger about the topic
in to make the
even under presentation more readable for annotators be
not receive and representation to subject verb particle
and
direct object
so
i subject percent or topic or particle all will be mapped to person tackle part
which is more readable to the user's
and this is the result
what the runtime evaluation
so
only seven percent are judged to be continued and topic relevant and we think this
is because
that the camping trip a big actually does not exist in the collection
an overall only forty two percent are just to be continue
we evaluate our topic specific content event pairs in the same big help to clustering
method selecting
the pairs that are
more than five times frequent filtering by the same event utterance
and the ranking by call control model
on the same clustering and ranking method and evaluating the top hundred for each
a topic and this is there a result
that is showing that
for each topic for the camping forty four percent and for the store
thirty three percent
are contingent and type of nn
and overall about eighty percent of the
all the pairs that the other and are contingent and
on average inter annotator
reliability on these mechanical turk task was
point seven three we show substantial agreement so
finally i want to show some examples of the event first
no
we show that the results on the topic specific are stronger and even by looking
at the examples you can see that the knowledge that we learn is more interesting
like climb find a rock or we include transformer power was a three policies crush
relocation exactly person but
the ones from the general domain data set or more general like person who were
down trail or persons you can't person simply use or personable locked on
and
conclusion
so we learn new type of knowledge at the current knowledge round every event that
is not available in the a previous work on the news wire genre
you have a collection is that of uses a supervised model to with this relevant
greatest topic specific data and the first work that the
compared the results on the topic specific words that the general domain the story
have to new evaluation method one of them completely new on mechanical turk and the
other one inspired by the core task
and the results
i have already talked about this and by doing things
and you
i think that's true so if you have the dataset that is specific to even
it's easier
to learn and it's
the methods will be more effective
that is definitely an interesting idea i have tried to war to make model
on the corpus but the results didn't look a good
like that the ones that are considered similar or when i look at then they
are actually not similar for our task
the labeling is only for the for the
the only thing is only here
or the
stories not the event types
right so the event handlers are generated automatically by the awful like
you just need to find some topics like
come up with the topics like you think that okay what people right on the
blocks
four minutes and store and then you go you look at the corpus you try
to find a small c
a small set of stories that are undoubtedly
so what it initially was running a topic modeling is
but it topics that are generated or non-coherent but they help you get some why
doesn't what topics exist here so you know that you can go look for the
stories about going
you like whoa
or going on a camping trip
but once you come up with the topics that actually this i think you can
expand this and then be more and more rounds of what is tracking you can
collect more data