the next talk we will be presented by charlie rose

entitled which aspects of discourse relations are hard to learn primitive decomposition for discourse relation

classification

so hi everyone i'm sorry "'cause" i'm going to present a joint work with a

three point

in this work we are interested we are interested in the following question which aspects

of discourse relations are hard to learn

by aspects we mean that are the information and could use it by discourse relations

can be decomposed into a small set of characteristics

that we call primitives

and in this work we implements a primitive decomposition of discourse relations

you know that's will help discourse relations classification

so the global task we are interested in is discourse parsing

which aims at identifying discourse structure

this structure is a composed by semantic and pragmatic links between discourse units

these units can cover text spans of the various sizes

the links are called discourse relations and is relations can be either explicit or implicit

for example in one

we have a contract expectation relation if you we use we use do you

the relation from the penn discourse treebank that we are going to present later

and this is a relation is explicitly marked by to connect you bats

whereas in the second example

we have a reason relation and here we don't have any connective to mark the

relation

it's an implicit relation

so they are several to reason frameworks that a much representing discourse structure among the

most well-known we have rst sdrt and the penn discourse treebank framework

so we have corpora annotated following these various frameworks

but did we have no consensus on the label set some discourse relations

in each from what we have more or less specific relations and coding relations that

do a different levels of granularity for instance the contrast relation from the i-th in

the rt

corresponds to tree relations in the rst

so even if the different label sets

our difference we argue that they include a common range of semantic and pragmatic information

and we wonder if it's possible to find a way to represent discriminant fame information

so discourse relations identification is generally seen as a classification task

and it uses separated between x p c it an interesting relations identification

the second task implicit relations identification is considered as the artists

in fact the results remain quite low on this task i just by the variance

of approaches that have been tried

so we can as close as if at the problem is only about the way

we represent the data

also but wait the task is modeled

so in this work we want to act on the way we model staffed by

splitting it's

into several simpler task

the idea is to decompose the problem and to investigate the reasons of the difficulty

you know a discourse relation identification

so to have several simpler task we decompose the information and coded in by the

relation labels into values for small set of characteristics that we call primitives

to do is we rely on the cognitive approach to coherence relations

which provides a proper provide an inventory of a

dimensions that we could primitives of relations

this infantry is provided we have mappings from the relation of pdtb rst n is

the to each to into primitive values

they hard core primitives which are the original the do the ones in the original

the c r

and additional ones that were introduced to explicit the specificities of the various from also

the mappings

so these mappings can be seen as an interface between the existing frameworks

so in our work we provide an operational mapping between annotated relations two sets of

primitive values and we test the approach and the penn discourse treebank but you do

goal is to extend the approach to other remote state later

so we try to answer the question which primitives the harder to predict by diff

defining a several classification tasks for each primitive

then we do a reverse mapping from the sets of

predicted primitive values to a set of to two were compatible relation labels

and we and with a relation identification system that we want to evaluate

so here at penn discourse treebank hierarchy

it is tree levels representing the different granularities so we have more less specific relations

on the top little the level one we have relations that cold classes

and then we have types at level two and supply such little tree

so we have and labels which are the most specific relations at level tree l

two

and in term at its wines which are underspecified relations they can they have a

relations on the them that's the

but a finer

so we take each pdtb relation and of map it into a set of primitive

values

we have five core primitives that we're going to illustrate the each have two or

three values

plus we ideas and the n s value for an unspecified

it was to treat some cases of on but when in the us ecr mapping

there were several possible value for one primitive

all to treat the case of intermediate labels that were absent from the cc a

mapping

and we have three additional primitives the that are binary conditional a tentative and specificity

so two illustrates the mapping tool krakow primitives are we can secure example of the

contract dictation relation

here for the

from the contents of the first units

we have an expected indication which is that the by a few cost more

because it's more expensive to produce

and in the second units this expectation is denied

in fact the bile sure doesn't possible

so here the mapping of contracts dictation into a primitive values

so because it involves an indication that this or relation it is associated with the

basic operation that is causal

otherwise it would be additive

because it involves a negation the polarity is a set to negative otherwise it would

be sparsity

and we have the value basic for implication although we have here in implication

and the do inflectional the refers to the mm or

the arguments in which the premise of the implication now it's

the all the values i are non basic and eighty

which is not applicable for additive relations

the another primitive another primitive it's got source of coherence which refers to a common

distinction in the literature

we have objective relations which operates at the level of profit of propositional content and

subjective one at that operate at the base to make a speech that

speech act level

sorry

here we have an example of a subjective relation which is justification

here i state that meets easy regan is lying because they found students who said

she gave me

similar

so we have the mapping to of justification into a primitive values its causal positive

and one basic and we have the values subjective

and it remains non specifies for a temporal all the

and temporal all the is eyes free values chronological entrepreneur you couldn't synchronous

so with respect to the penn discourse treebank higher actually all these primitives are not

equal in importance

some of them are able to make distinctions between the top level classes it's the

case for basic operation and polarity for instance basic operation as the value close all

for all relations

and the other contingency class

and i did steve for relations and the other component class

is the same for polarity we have all the compare is and relations that are

negative for polarity

and we have other priorities that makes to that makes label distinctions at lower levels

is able to a tree

so here's tum that we have applied the mapping to each relation in the penn

discourse treebank

and here's the distribution of values for each primitive in the corpus

on the left we have the list of primitives and on the right to the

list of all values and are mixed together

so

for each primitive we define a classification task

we have one and twenty eight thousand pairs of arguments for an hour training set

we use that quite straightforward the actually true for the classification

each argument of the relations e the represented with the interest and sentence encoder which

is a very common for semantics task so each argument is mapped into pre-trained what

invading and then on coded with the by nist and with max pretty putting

and after that we combine the two arguments representation with concatenation a difference and products

we test various settings

we tested various settings we tested various a very sizable an additional layer on top

of the arguments combinations and different regularization values

not so we take the base sitting at a as a best model for each

for each task

and the as a baseline we take a majority classifier

so the results in accuracy and natural f one

for the baseline in blue and at the best model in all range

for each call primitive polarity basic operations so coherence implication or the and temporal order

i

i don't have

for first of all argument pairs in the core and the corpus

in the test in the

that's

we are all primitives that are correctly predicted which is not very good but in

of rage we have at each person primitive that are correctly predicted

we're going to discuss about polarity and basic impression and we said before that

they are the most important primitives respect to the penn discourse treebank higher iq and

the a similar distribution of values where the are comparable

basic operation it has the lowest improvements with respect to the baseline over all the

core primitives

and we i don't see five correctly only seventeen percent of causal relations

and we have better results for polarity

it doesn't greater improvement with respect to the baseline and we have fifty best and

the negative relations that we correctly

like label

source of coherence is the primitive that as the greatest improvement with respect to the

baseline but we have to temper this result because we have less than one person

all subjective relation in our dataset so we need to have only object of relations

and recent for which we have the not specified value so it that's not very

informative

for time for it all the we have a little improvement with respect to the

baseline and this is due to the fact that relations are

i mean method

i mean in table that an unspecified

after that we wanted to evaluate the performance of our systems on predicting discourse relations

so we operate the reverse mapping from the set of predicates that but use for

each primitive to the set to a set of compare to be compressible relation labels

so we start with a set containing all the possible relations at all levels

then we remove the relations that are incompatible with the primitive values that we protected

for instance if the polarity is predicted positive we remove all relations associated with a

negative polarity and we do the same for each primitive

and then we removed you're in time zones information of if the set contains all

the sub types and their insightful or all the types underclass we only keep the

upper level underspecified relation

so we evaluate we have a number of questions we need to measure for a

hierarchical classification we have a and the specifications in

in the devaluation

the

predicate level can be more or less specific than the goal label from p t

v

and we didn't measure for unmentionable classification

in fact we all system can predict a addictions angle relation

so we use or a hierarchical approach precision and recall on the set of all

labels

so for instance

on the on the left if we have in the goal of a one relation

which is expressions as and that's you

and we predict two relations that are finer

we are okay onto labels and the we have to elements that are wrong so

i'll precision is of a zero point five

whereas in the example on the rights if we have two relations in the gold

and

we only predicts one relation which is less specific

we have to only a good labels and we missed some of them so we

have a recall of the run five

so we compare the system

we've the reverse mapping from affected primitives into a set of relations with system with

the direct this was discourse relations classification with no decomposition

into primitives

and we as a measure of we give the accuracy you do hierarchical precision and

recall that we just presented and

again the hierarchical scores but only on the best match between what we predicted and

the pdtb relations

so here we can see that

the system with

for that's it that's the prefix relations with the remote inverse mapping from the predicted

primitives

as lower results on the on the autumn users except for the

the max hierarchical precision

and by observing the results we see that's we may real meeting a lot of

contingency class relations which is consistent as with us what we so on the on

the primitive prediction because we are missing the value close that in most of the

cases for the former for the primitive basic operation

and we were wrongly predicts that the temporal class relations very often

so this is due to the fact that this a relation is associated with quite

underspecified values for the primitives

and that generally we can say that prediction primitives still leaves too much underspecification and

it as and then back to the recall

and we predict too many labels so it as and then based on our precision

so to conclude we

we can see that one of the most important primitives a that's basic operation seems

to be the hardest to predict

and we so that the period is obviously are not independent from each of her

so when we learn them in isolation we are less accurate than when we learn

a fully specified relation

so one of the things that we can we want to do is the azimuth

fast onion set learning setting

and we want also to extend the approach by applying this decomposition into to all

the scores frameworks

in order to have a cross compiler our training and prediction

thank you

they carry much as i questions

alright then thank the single that the thing the speaker again