the next talk we will be presented by charlie rose
entitled which aspects of discourse relations are hard to learn primitive decomposition for discourse relation
classification
so hi everyone i'm sorry "'cause" i'm going to present a joint work with a
three point
in this work we are interested we are interested in the following question which aspects
of discourse relations are hard to learn
by aspects we mean that are the information and could use it by discourse relations
can be decomposed into a small set of characteristics
that we call primitives
and in this work we implements a primitive decomposition of discourse relations
you know that's will help discourse relations classification
so the global task we are interested in is discourse parsing
which aims at identifying discourse structure
this structure is a composed by semantic and pragmatic links between discourse units
these units can cover text spans of the various sizes
the links are called discourse relations and is relations can be either explicit or implicit
for example in one
we have a contract expectation relation if you we use we use do you
the relation from the penn discourse treebank that we are going to present later
and this is a relation is explicitly marked by to connect you bats
whereas in the second example
we have a reason relation and here we don't have any connective to mark the
relation
it's an implicit relation
so they are several to reason frameworks that a much representing discourse structure among the
most well-known we have rst sdrt and the penn discourse treebank framework
so we have corpora annotated following these various frameworks
but did we have no consensus on the label set some discourse relations
in each from what we have more or less specific relations and coding relations that
do a different levels of granularity for instance the contrast relation from the i-th in
the rt
corresponds to tree relations in the rst
so even if the different label sets
our difference we argue that they include a common range of semantic and pragmatic information
and we wonder if it's possible to find a way to represent discriminant fame information
so discourse relations identification is generally seen as a classification task
and it uses separated between x p c it an interesting relations identification
the second task implicit relations identification is considered as the artists
in fact the results remain quite low on this task i just by the variance
of approaches that have been tried
so we can as close as if at the problem is only about the way
we represent the data
also but wait the task is modeled
so in this work we want to act on the way we model staffed by
splitting it's
into several simpler task
the idea is to decompose the problem and to investigate the reasons of the difficulty
you know a discourse relation identification
so to have several simpler task we decompose the information and coded in by the
relation labels into values for small set of characteristics that we call primitives
to do is we rely on the cognitive approach to coherence relations
which provides a proper provide an inventory of a
dimensions that we could primitives of relations
this infantry is provided we have mappings from the relation of pdtb rst n is
the to each to into primitive values
they hard core primitives which are the original the do the ones in the original
the c r
and additional ones that were introduced to explicit the specificities of the various from also
the mappings
so these mappings can be seen as an interface between the existing frameworks
so in our work we provide an operational mapping between annotated relations two sets of
primitive values and we test the approach and the penn discourse treebank but you do
goal is to extend the approach to other remote state later
so we try to answer the question which primitives the harder to predict by diff
defining a several classification tasks for each primitive
then we do a reverse mapping from the sets of
predicted primitive values to a set of to two were compatible relation labels
and we and with a relation identification system that we want to evaluate
so here at penn discourse treebank hierarchy
it is tree levels representing the different granularities so we have more less specific relations
on the top little the level one we have relations that cold classes
and then we have types at level two and supply such little tree
so we have and labels which are the most specific relations at level tree l
two
and in term at its wines which are underspecified relations they can they have a
relations on the them that's the
but a finer
so we take each pdtb relation and of map it into a set of primitive
values
we have five core primitives that we're going to illustrate the each have two or
three values
plus we ideas and the n s value for an unspecified
it was to treat some cases of on but when in the us ecr mapping
there were several possible value for one primitive
all to treat the case of intermediate labels that were absent from the cc a
mapping
and we have three additional primitives the that are binary conditional a tentative and specificity
so two illustrates the mapping tool krakow primitives are we can secure example of the
contract dictation relation
here for the
from the contents of the first units
we have an expected indication which is that the by a few cost more
because it's more expensive to produce
and in the second units this expectation is denied
in fact the bile sure doesn't possible
so here the mapping of contracts dictation into a primitive values
so because it involves an indication that this or relation it is associated with the
basic operation that is causal
otherwise it would be additive
because it involves a negation the polarity is a set to negative otherwise it would
be sparsity
and we have the value basic for implication although we have here in implication
and the do inflectional the refers to the mm or
the arguments in which the premise of the implication now it's
the all the values i are non basic and eighty
which is not applicable for additive relations
the another primitive another primitive it's got source of coherence which refers to a common
distinction in the literature
we have objective relations which operates at the level of profit of propositional content and
subjective one at that operate at the base to make a speech that
speech act level
sorry
here we have an example of a subjective relation which is justification
here i state that meets easy regan is lying because they found students who said
she gave me
similar
so we have the mapping to of justification into a primitive values its causal positive
and one basic and we have the values subjective
and it remains non specifies for a temporal all the
and temporal all the is eyes free values chronological entrepreneur you couldn't synchronous
so with respect to the penn discourse treebank higher actually all these primitives are not
equal in importance
some of them are able to make distinctions between the top level classes it's the
case for basic operation and polarity for instance basic operation as the value close all
for all relations
and the other contingency class
and i did steve for relations and the other component class
is the same for polarity we have all the compare is and relations that are
negative for polarity
and we have other priorities that makes to that makes label distinctions at lower levels
is able to a tree
so here's tum that we have applied the mapping to each relation in the penn
discourse treebank
and here's the distribution of values for each primitive in the corpus
on the left we have the list of primitives and on the right to the
list of all values and are mixed together
so
for each primitive we define a classification task
we have one and twenty eight thousand pairs of arguments for an hour training set
we use that quite straightforward the actually true for the classification
each argument of the relations e the represented with the interest and sentence encoder which
is a very common for semantics task so each argument is mapped into pre-trained what
invading and then on coded with the by nist and with max pretty putting
and after that we combine the two arguments representation with concatenation a difference and products
we test various settings
we tested various settings we tested various a very sizable an additional layer on top
of the arguments combinations and different regularization values
not so we take the base sitting at a as a best model for each
for each task
and the as a baseline we take a majority classifier
so the results in accuracy and natural f one
for the baseline in blue and at the best model in all range
for each call primitive polarity basic operations so coherence implication or the and temporal order
i
i don't have
for first of all argument pairs in the core and the corpus
in the test in the
that's
we are all primitives that are correctly predicted which is not very good but in
of rage we have at each person primitive that are correctly predicted
we're going to discuss about polarity and basic impression and we said before that
they are the most important primitives respect to the penn discourse treebank higher iq and
the a similar distribution of values where the are comparable
basic operation it has the lowest improvements with respect to the baseline over all the
core primitives
and we i don't see five correctly only seventeen percent of causal relations
and we have better results for polarity
it doesn't greater improvement with respect to the baseline and we have fifty best and
the negative relations that we correctly
like label
source of coherence is the primitive that as the greatest improvement with respect to the
baseline but we have to temper this result because we have less than one person
all subjective relation in our dataset so we need to have only object of relations
and recent for which we have the not specified value so it that's not very
informative
for time for it all the we have a little improvement with respect to the
baseline and this is due to the fact that relations are
i mean method
i mean in table that an unspecified
after that we wanted to evaluate the performance of our systems on predicting discourse relations
so we operate the reverse mapping from the set of predicates that but use for
each primitive to the set to a set of compare to be compressible relation labels
so we start with a set containing all the possible relations at all levels
then we remove the relations that are incompatible with the primitive values that we protected
for instance if the polarity is predicted positive we remove all relations associated with a
negative polarity and we do the same for each primitive
and then we removed you're in time zones information of if the set contains all
the sub types and their insightful or all the types underclass we only keep the
upper level underspecified relation
so we evaluate we have a number of questions we need to measure for a
hierarchical classification we have a and the specifications in
in the devaluation
the
predicate level can be more or less specific than the goal label from p t
v
and we didn't measure for unmentionable classification
in fact we all system can predict a addictions angle relation
so we use or a hierarchical approach precision and recall on the set of all
labels
so for instance
on the on the left if we have in the goal of a one relation
which is expressions as and that's you
and we predict two relations that are finer
we are okay onto labels and the we have to elements that are wrong so
i'll precision is of a zero point five
whereas in the example on the rights if we have two relations in the gold
and
we only predicts one relation which is less specific
we have to only a good labels and we missed some of them so we
have a recall of the run five
so we compare the system
we've the reverse mapping from affected primitives into a set of relations with system with
the direct this was discourse relations classification with no decomposition
into primitives
and we as a measure of we give the accuracy you do hierarchical precision and
recall that we just presented and
again the hierarchical scores but only on the best match between what we predicted and
the pdtb relations
so here we can see that
the system with
for that's it that's the prefix relations with the remote inverse mapping from the predicted
primitives
as lower results on the on the autumn users except for the
the max hierarchical precision
and by observing the results we see that's we may real meeting a lot of
contingency class relations which is consistent as with us what we so on the on
the primitive prediction because we are missing the value close that in most of the
cases for the former for the primitive basic operation
and we were wrongly predicts that the temporal class relations very often
so this is due to the fact that this a relation is associated with quite
underspecified values for the primitives
and that generally we can say that prediction primitives still leaves too much underspecification and
it as and then back to the recall
and we predict too many labels so it as and then based on our precision
so to conclude we
we can see that one of the most important primitives a that's basic operation seems
to be the hardest to predict
and we so that the period is obviously are not independent from each of her
so when we learn them in isolation we are less accurate than when we learn
a fully specified relation
so one of the things that we can we want to do is the azimuth
fast onion set learning setting
and we want also to extend the approach by applying this decomposition into to all
the scores frameworks
in order to have a cross compiler our training and prediction
thank you
they carry much as i questions
alright then thank the single that the thing the speaker again