0:00:17 | the next talk we will be presented by charlie rose |
---|
0:00:22 | entitled which aspects of discourse relations are hard to learn primitive decomposition for discourse relation |
---|
0:00:28 | classification |
---|
0:00:31 | so hi everyone i'm sorry "'cause" i'm going to present a joint work with a |
---|
0:00:37 | three point |
---|
0:00:39 | in this work we are interested we are interested in the following question which aspects |
---|
0:00:43 | of discourse relations are hard to learn |
---|
0:00:45 | by aspects we mean that are the information and could use it by discourse relations |
---|
0:00:50 | can be decomposed into a small set of characteristics |
---|
0:00:55 | that we call primitives |
---|
0:00:57 | and in this work we implements a primitive decomposition of discourse relations |
---|
0:01:02 | you know that's will help discourse relations classification |
---|
0:01:06 | so the global task we are interested in is discourse parsing |
---|
0:01:10 | which aims at identifying discourse structure |
---|
0:01:13 | this structure is a composed by semantic and pragmatic links between discourse units |
---|
0:01:19 | these units can cover text spans of the various sizes |
---|
0:01:23 | the links are called discourse relations and is relations can be either explicit or implicit |
---|
0:01:29 | for example in one |
---|
0:01:31 | we have a contract expectation relation if you we use we use do you |
---|
0:01:36 | the relation from the penn discourse treebank that we are going to present later |
---|
0:01:42 | and this is a relation is explicitly marked by to connect you bats |
---|
0:01:47 | whereas in the second example |
---|
0:01:49 | we have a reason relation and here we don't have any connective to mark the |
---|
0:01:53 | relation |
---|
0:01:54 | it's an implicit relation |
---|
0:01:58 | so they are several to reason frameworks that a much representing discourse structure among the |
---|
0:02:03 | most well-known we have rst sdrt and the penn discourse treebank framework |
---|
0:02:10 | so we have corpora annotated following these various frameworks |
---|
0:02:14 | but did we have no consensus on the label set some discourse relations |
---|
0:02:20 | in each from what we have more or less specific relations and coding relations that |
---|
0:02:25 | do a different levels of granularity for instance the contrast relation from the i-th in |
---|
0:02:32 | the rt |
---|
0:02:33 | corresponds to tree relations in the rst |
---|
0:02:38 | so even if the different label sets |
---|
0:02:43 | our difference we argue that they include a common range of semantic and pragmatic information |
---|
0:02:48 | and we wonder if it's possible to find a way to represent discriminant fame information |
---|
0:02:54 | so discourse relations identification is generally seen as a classification task |
---|
0:03:00 | and it uses separated between x p c it an interesting relations identification |
---|
0:03:07 | the second task implicit relations identification is considered as the artists |
---|
0:03:13 | in fact the results remain quite low on this task i just by the variance |
---|
0:03:19 | of approaches that have been tried |
---|
0:03:22 | so we can as close as if at the problem is only about the way |
---|
0:03:26 | we represent the data |
---|
0:03:28 | also but wait the task is modeled |
---|
0:03:31 | so in this work we want to act on the way we model staffed by |
---|
0:03:34 | splitting it's |
---|
0:03:36 | into several simpler task |
---|
0:03:38 | the idea is to decompose the problem and to investigate the reasons of the difficulty |
---|
0:03:42 | you know a discourse relation identification |
---|
0:03:45 | so to have several simpler task we decompose the information and coded in by the |
---|
0:03:52 | relation labels into values for small set of characteristics that we call primitives |
---|
0:03:58 | to do is we rely on the cognitive approach to coherence relations |
---|
0:04:03 | which provides a proper provide an inventory of a |
---|
0:04:07 | dimensions that we could primitives of relations |
---|
0:04:11 | this infantry is provided we have mappings from the relation of pdtb rst n is |
---|
0:04:17 | the to each to into primitive values |
---|
0:04:21 | they hard core primitives which are the original the do the ones in the original |
---|
0:04:26 | the c r |
---|
0:04:29 | and additional ones that were introduced to explicit the specificities of the various from also |
---|
0:04:35 | the mappings |
---|
0:04:37 | so these mappings can be seen as an interface between the existing frameworks |
---|
0:04:44 | so in our work we provide an operational mapping between annotated relations two sets of |
---|
0:04:51 | primitive values and we test the approach and the penn discourse treebank but you do |
---|
0:04:56 | goal is to extend the approach to other remote state later |
---|
0:05:01 | so we try to answer the question which primitives the harder to predict by diff |
---|
0:05:05 | defining a several classification tasks for each primitive |
---|
0:05:10 | then we do a reverse mapping from the sets of |
---|
0:05:14 | predicted primitive values to a set of to two were compatible relation labels |
---|
0:05:20 | and we and with a relation identification system that we want to evaluate |
---|
0:05:26 | so here at penn discourse treebank hierarchy |
---|
0:05:29 | it is tree levels representing the different granularities so we have more less specific relations |
---|
0:05:36 | on the top little the level one we have relations that cold classes |
---|
0:05:41 | and then we have types at level two and supply such little tree |
---|
0:05:45 | so we have and labels which are the most specific relations at level tree l |
---|
0:05:49 | two |
---|
0:05:50 | and in term at its wines which are underspecified relations they can they have a |
---|
0:05:56 | relations on the them that's the |
---|
0:05:58 | but a finer |
---|
0:06:00 | so we take each pdtb relation and of map it into a set of primitive |
---|
0:06:04 | values |
---|
0:06:06 | we have five core primitives that we're going to illustrate the each have two or |
---|
0:06:11 | three values |
---|
0:06:12 | plus we ideas and the n s value for an unspecified |
---|
0:06:18 | it was to treat some cases of on but when in the us ecr mapping |
---|
0:06:22 | there were several possible value for one primitive |
---|
0:06:26 | all to treat the case of intermediate labels that were absent from the cc a |
---|
0:06:30 | mapping |
---|
0:06:32 | and we have three additional primitives the that are binary conditional a tentative and specificity |
---|
0:06:39 | so two illustrates the mapping tool krakow primitives are we can secure example of the |
---|
0:06:44 | contract dictation relation |
---|
0:06:48 | here for the |
---|
0:06:50 | from the contents of the first units |
---|
0:06:53 | we have an expected indication which is that the by a few cost more |
---|
0:06:57 | because it's more expensive to produce |
---|
0:07:00 | and in the second units this expectation is denied |
---|
0:07:05 | in fact the bile sure doesn't possible |
---|
0:07:09 | so here the mapping of contracts dictation into a primitive values |
---|
0:07:14 | so because it involves an indication that this or relation it is associated with the |
---|
0:07:19 | basic operation that is causal |
---|
0:07:21 | otherwise it would be additive |
---|
0:07:24 | because it involves a negation the polarity is a set to negative otherwise it would |
---|
0:07:30 | be sparsity |
---|
0:07:32 | and we have the value basic for implication although we have here in implication |
---|
0:07:38 | and the do inflectional the refers to the mm or |
---|
0:07:43 | the arguments in which the premise of the implication now it's |
---|
0:07:49 | the all the values i are non basic and eighty |
---|
0:07:52 | which is not applicable for additive relations |
---|
0:07:57 | the another primitive another primitive it's got source of coherence which refers to a common |
---|
0:08:04 | distinction in the literature |
---|
0:08:07 | we have objective relations which operates at the level of profit of propositional content and |
---|
0:08:13 | subjective one at that operate at the base to make a speech that |
---|
0:08:16 | speech act level |
---|
0:08:18 | sorry |
---|
0:08:20 | here we have an example of a subjective relation which is justification |
---|
0:08:25 | here i state that meets easy regan is lying because they found students who said |
---|
0:08:30 | she gave me |
---|
0:08:31 | similar |
---|
0:08:33 | so we have the mapping to of justification into a primitive values its causal positive |
---|
0:08:38 | and one basic and we have the values subjective |
---|
0:08:42 | and it remains non specifies for a temporal all the |
---|
0:08:46 | and temporal all the is eyes free values chronological entrepreneur you couldn't synchronous |
---|
0:08:54 | so with respect to the penn discourse treebank higher actually all these primitives are not |
---|
0:08:59 | equal in importance |
---|
0:09:01 | some of them are able to make distinctions between the top level classes it's the |
---|
0:09:06 | case for basic operation and polarity for instance basic operation as the value close all |
---|
0:09:11 | for all relations |
---|
0:09:13 | and the other contingency class |
---|
0:09:15 | and i did steve for relations and the other component class |
---|
0:09:18 | is the same for polarity we have all the compare is and relations that are |
---|
0:09:23 | negative for polarity |
---|
0:09:27 | and we have other priorities that makes to that makes label distinctions at lower levels |
---|
0:09:32 | is able to a tree |
---|
0:09:36 | so here's tum that we have applied the mapping to each relation in the penn |
---|
0:09:40 | discourse treebank |
---|
0:09:42 | and here's the distribution of values for each primitive in the corpus |
---|
0:09:48 | on the left we have the list of primitives and on the right to the |
---|
0:09:52 | list of all values and are mixed together |
---|
0:09:59 | so |
---|
0:09:59 | for each primitive we define a classification task |
---|
0:10:03 | we have one and twenty eight thousand pairs of arguments for an hour training set |
---|
0:10:09 | we use that quite straightforward the actually true for the classification |
---|
0:10:15 | each argument of the relations e the represented with the interest and sentence encoder which |
---|
0:10:22 | is a very common for semantics task so each argument is mapped into pre-trained what |
---|
0:10:27 | invading and then on coded with the by nist and with max pretty putting |
---|
0:10:31 | and after that we combine the two arguments representation with concatenation a difference and products |
---|
0:10:39 | we test various settings |
---|
0:10:42 | we tested various settings we tested various a very sizable an additional layer on top |
---|
0:10:49 | of the arguments combinations and different regularization values |
---|
0:10:56 | not so we take the base sitting at a as a best model for each |
---|
0:10:59 | for each task |
---|
0:11:02 | and the as a baseline we take a majority classifier |
---|
0:11:07 | so the results in accuracy and natural f one |
---|
0:11:11 | for the baseline in blue and at the best model in all range |
---|
0:11:16 | for each call primitive polarity basic operations so coherence implication or the and temporal order |
---|
0:11:25 | i |
---|
0:11:26 | i don't have |
---|
0:11:27 | for first of all argument pairs in the core and the corpus |
---|
0:11:31 | in the test in the |
---|
0:11:33 | that's |
---|
0:11:34 | we are all primitives that are correctly predicted which is not very good but in |
---|
0:11:39 | of rage we have at each person primitive that are correctly predicted |
---|
0:11:44 | we're going to discuss about polarity and basic impression and we said before that |
---|
0:11:49 | they are the most important primitives respect to the penn discourse treebank higher iq and |
---|
0:11:54 | the a similar distribution of values where the are comparable |
---|
0:11:59 | basic operation it has the lowest improvements with respect to the baseline over all the |
---|
0:12:04 | core primitives |
---|
0:12:06 | and we i don't see five correctly only seventeen percent of causal relations |
---|
0:12:14 | and we have better results for polarity |
---|
0:12:17 | it doesn't greater improvement with respect to the baseline and we have fifty best and |
---|
0:12:22 | the negative relations that we correctly |
---|
0:12:25 | like label |
---|
0:12:27 | source of coherence is the primitive that as the greatest improvement with respect to the |
---|
0:12:31 | baseline but we have to temper this result because we have less than one person |
---|
0:12:36 | all subjective relation in our dataset so we need to have only object of relations |
---|
0:12:42 | and recent for which we have the not specified value so it that's not very |
---|
0:12:46 | informative |
---|
0:12:48 | for time for it all the we have a little improvement with respect to the |
---|
0:12:52 | baseline and this is due to the fact that relations are |
---|
0:12:56 | i mean method |
---|
0:12:57 | i mean in table that an unspecified |
---|
0:13:02 | after that we wanted to evaluate the performance of our systems on predicting discourse relations |
---|
0:13:08 | so we operate the reverse mapping from the set of predicates that but use for |
---|
0:13:13 | each primitive to the set to a set of compare to be compressible relation labels |
---|
0:13:18 | so we start with a set containing all the possible relations at all levels |
---|
0:13:24 | then we remove the relations that are incompatible with the primitive values that we protected |
---|
0:13:29 | for instance if the polarity is predicted positive we remove all relations associated with a |
---|
0:13:35 | negative polarity and we do the same for each primitive |
---|
0:13:40 | and then we removed you're in time zones information of if the set contains all |
---|
0:13:45 | the sub types and their insightful or all the types underclass we only keep the |
---|
0:13:50 | upper level underspecified relation |
---|
0:13:53 | so we evaluate we have a number of questions we need to measure for a |
---|
0:13:59 | hierarchical classification we have a and the specifications in |
---|
0:14:03 | in the devaluation |
---|
0:14:06 | the |
---|
0:14:06 | predicate level can be more or less specific than the goal label from p t |
---|
0:14:11 | v |
---|
0:14:12 | and we didn't measure for unmentionable classification |
---|
0:14:16 | in fact we all system can predict a addictions angle relation |
---|
0:14:21 | so we use or a hierarchical approach precision and recall on the set of all |
---|
0:14:27 | labels |
---|
0:14:30 | so for instance |
---|
0:14:33 | on the on the left if we have in the goal of a one relation |
---|
0:14:37 | which is expressions as and that's you |
---|
0:14:40 | and we predict two relations that are finer |
---|
0:14:44 | we are okay onto labels and the we have to elements that are wrong so |
---|
0:14:50 | i'll precision is of a zero point five |
---|
0:14:53 | whereas in the example on the rights if we have two relations in the gold |
---|
0:14:59 | and |
---|
0:15:00 | we only predicts one relation which is less specific |
---|
0:15:04 | we have to only a good labels and we missed some of them so we |
---|
0:15:09 | have a recall of the run five |
---|
0:15:11 | so we compare the system |
---|
0:15:16 | we've the reverse mapping from affected primitives into a set of relations with system with |
---|
0:15:24 | the direct this was discourse relations classification with no decomposition |
---|
0:15:29 | into primitives |
---|
0:15:30 | and we as a measure of we give the accuracy you do hierarchical precision and |
---|
0:15:34 | recall that we just presented and |
---|
0:15:36 | again the hierarchical scores but only on the best match between what we predicted and |
---|
0:15:43 | the pdtb relations |
---|
0:15:46 | so here we can see that |
---|
0:15:49 | the system with |
---|
0:15:52 | for that's it that's the prefix relations with the remote inverse mapping from the predicted |
---|
0:15:59 | primitives |
---|
0:16:00 | as lower results on the on the autumn users except for the |
---|
0:16:04 | the max hierarchical precision |
---|
0:16:09 | and by observing the results we see that's we may real meeting a lot of |
---|
0:16:13 | contingency class relations which is consistent as with us what we so on the on |
---|
0:16:19 | the primitive prediction because we are missing the value close that in most of the |
---|
0:16:24 | cases for the former for the primitive basic operation |
---|
0:16:29 | and we were wrongly predicts that the temporal class relations very often |
---|
0:16:36 | so this is due to the fact that this a relation is associated with quite |
---|
0:16:41 | underspecified values for the primitives |
---|
0:16:45 | and that generally we can say that prediction primitives still leaves too much underspecification and |
---|
0:16:51 | it as and then back to the recall |
---|
0:16:54 | and we predict too many labels so it as and then based on our precision |
---|
0:17:00 | so to conclude we |
---|
0:17:03 | we can see that one of the most important primitives a that's basic operation seems |
---|
0:17:08 | to be the hardest to predict |
---|
0:17:11 | and we so that the period is obviously are not independent from each of her |
---|
0:17:14 | so when we learn them in isolation we are less accurate than when we learn |
---|
0:17:20 | a fully specified relation |
---|
0:17:23 | so one of the things that we can we want to do is the azimuth |
---|
0:17:28 | fast onion set learning setting |
---|
0:17:31 | and we want also to extend the approach by applying this decomposition into to all |
---|
0:17:37 | the scores frameworks |
---|
0:17:39 | in order to have a cross compiler our training and prediction |
---|
0:17:43 | thank you |
---|
0:17:52 | they carry much as i questions |
---|
0:18:05 | alright then thank the single that the thing the speaker again |
---|