0:00:15 | thank you |
---|
0:00:17 | so electrical because a recent work and the to be a scaled down there |
---|
0:00:23 | exactly in collaboration with a one company the company score the added combine the |
---|
0:00:31 | and is this company is of interest e and e commerce in the promise a |
---|
0:00:39 | scenario a we have been working on a and decoder can you show no for |
---|
0:00:44 | conversational agents a in this in this scenario |
---|
0:00:50 | so they get the general idea is that and that the long-term goal of this |
---|
0:00:56 | work is a kind of conversational agent that kind of the shop assistant that the |
---|
0:01:02 | media but users in a buying products in that one income in any comments you |
---|
0:01:09 | know |
---|
0:01:10 | so forth for instance i e for the user say second i find a kind |
---|
0:01:16 | of that was the arc of rules for my most of the suppose behavior of |
---|
0:01:23 | these shop assistant who presented the user really to the products |
---|
0:01:30 | this is a kind of task oriented scenario and the and basically it can be |
---|
0:01:36 | approached the with the traditional as follows really approach |
---|
0:01:41 | so we have several what the system is supposed to recognize in than the likes |
---|
0:01:47 | of a |
---|
0:01:48 | and then the to it's a classifier |
---|
0:01:52 | because if i you put the using the categories to provide the brand the colour |
---|
0:01:57 | and the and other properties |
---|
0:02:00 | so the approach should work on a seven dollar on a many to me for |
---|
0:02:07 | instance the cameras for mutual or groups through the and that that's |
---|
0:02:14 | why a relative and the probabilities of these us in its that the basically there |
---|
0:02:21 | are there are no i'm not stated that utterances so that i and all sentences |
---|
0:02:26 | or |
---|
0:02:27 | request from the user that which i i'm not stated that we the |
---|
0:02:33 | the i |
---|
0:02:34 | the properties the intent and the and the properties that |
---|
0:02:39 | of the specific domain |
---|
0:02:41 | and another problem another relevant factors that |
---|
0:02:46 | it might be easier |
---|
0:02:48 | to find that handles where |
---|
0:02:52 | information about verb that the is present |
---|
0:02:57 | so given these along with a scenario with focus the on the specific issues in |
---|
0:03:03 | the in this work |
---|
0:03:05 | so we focus on entity recognition |
---|
0:03:08 | so for instance the cup on the capacity to recognize kind of who's park that |
---|
0:03:13 | is one lady with the |
---|
0:03:16 | a user utterance |
---|
0:03:19 | we based our work on a gaseous |
---|
0:03:24 | that's its in this |
---|
0:03:25 | in this scenario are catalogues basically comparable opera dataset |
---|
0:03:30 | that we can get the from vendors the yukon the subband |
---|
0:03:35 | the main research question for as he is how far we can go without any |
---|
0:03:40 | and located and so this is why we call the other so the link this |
---|
0:03:46 | approach |
---|
0:03:48 | so few words about the specific issues of and it's nice product needs to and |
---|
0:03:55 | in this in this is in hong kong |
---|
0:03:59 | so basically this is different from traditional named entities we have what they in that |
---|
0:04:07 | the tradition of information extraction has been called the nominal in |
---|
0:04:13 | so for instance an entity may contain also connectives like a black and white fisher |
---|
0:04:22 | we have a black a black paint |
---|
0:04:25 | so black is a property all of open-ended |
---|
0:04:30 | entity names may contain bright adidas so wide beam that's for sure |
---|
0:04:36 | or even a proper names that i've a lot if you if the think about |
---|
0:04:39 | okay |
---|
0:04:41 | and use it you know how many |
---|
0:04:43 | and need to add products |
---|
0:04:50 | another a very important property for a for our approach is composed is compositionally |
---|
0:04:56 | so being and nominal entities we may assume and they'll respect to some competition my |
---|
0:05:03 | principle of the lane |
---|
0:05:05 | so if we have of for instance in the folder domain pasta with broccoli |
---|
0:05:12 | i and but we base also this is the and now plus the positional modifier |
---|
0:05:18 | we can add the objective alamouti five times faster would based |
---|
0:05:24 | but then knowing that we may have a slight but we may still |
---|
0:05:28 | a we may in fact that the |
---|
0:05:32 | being getting having both past that wouldn't broccoli and but it will broccoli may be |
---|
0:05:38 | maybe it's a it's a good name also spaghetti we'd based okay even if it |
---|
0:05:44 | is not as never been |
---|
0:05:46 | see before |
---|
0:05:47 | so this means that we can |
---|
0:05:49 | our the approach the should be able to take advantage of composition on |
---|
0:05:57 | a then might be the case of having a multiple of convinced it's all this |
---|
0:06:03 | of a semantic category in the same utterance |
---|
0:06:06 | which is not the colour of |
---|
0:06:10 | it's a d r that was synonymous a like a in booking flights usually one |
---|
0:06:16 | just unique shown |
---|
0:06:17 | one at time of our evaluation but not maybe |
---|
0:06:21 | it's quite well maybe i would like to work the salami pizza and most analysis |
---|
0:06:25 | entities |
---|
0:06:26 | so two entities to of the content of the scene categories to in the same |
---|
0:06:31 | utterance |
---|
0:06:34 | and then there is a strong the need of multilinguality of course so that it |
---|
0:06:38 | is |
---|
0:06:40 | you can to spend also they |
---|
0:06:42 | they need to translate accuracy in a in multiple languages |
---|
0:06:49 | okay so which are that were working hypotheses we would like a twenty nine but |
---|
0:06:54 | the model for entity recognition based only on if you look at how to cut |
---|
0:07:00 | the ropes like i got it |
---|
0:07:02 | and then we would like to apply this model |
---|
0:07:05 | to in order to label |
---|
0:07:08 | unseen entities in a user |
---|
0:07:11 | so the only is that we have a density is not nothing at and we |
---|
0:07:15 | will open all |
---|
0:07:17 | right to understand how far we can go |
---|
0:07:19 | that's the working on a on the t |
---|
0:07:23 | so the main into a ds set of our approach have the following |
---|
0:07:27 | take advantage of composition and nature of problem needs so we want to extract as |
---|
0:07:33 | much as possible knowledge from got it |
---|
0:07:37 | use as much as possible syntactically generated data |
---|
0:07:40 | a having no reality come from users |
---|
0:07:44 | a we need to work on syntactically generated data |
---|
0:07:49 | and then we would like to be as much as possible language independent |
---|
0:07:56 | so this is the approach basically for states to |
---|
0:08:00 | at the beginning we collected against india |
---|
0:08:03 | for a step in domain |
---|
0:08:06 | then the starting from these get here we generated both positive and negative example |
---|
0:08:13 | of all the and it needs can be regarded e |
---|
0:08:18 | so that will be on the base of positive and negative example |
---|
0:08:23 | will be the a classifier in our case a new classifier |
---|
0:08:28 | be able to recognize the entities in that in this in that the specific domain |
---|
0:08:34 | having these are classified this model that this classifier |
---|
0:08:38 | which is able to discriminate the weather |
---|
0:08:41 | is to design a sequence of tokens is that is contained in a in a |
---|
0:08:46 | second domain or more than we are we want to apply this model |
---|
0:08:50 | to recognise |
---|
0:08:53 | and names in utterance is |
---|
0:08:56 | and so we apply the classifier to all the stuff sequences |
---|
0:09:00 | of in user utterance |
---|
0:09:02 | in order to select and select all the best the sequence is that which are |
---|
0:09:07 | normal that like |
---|
0:09:09 | i will see the forced that we wish to the force that with that was |
---|
0:09:12 | some example |
---|
0:09:14 | so collected additive for a separate for the domain |
---|
0:09:18 | we just by screen and the website of a fine so for the number of |
---|
0:09:27 | the main okay like for the real the and four |
---|
0:09:34 | the underlying assumption here is that the screen being a website collect the |
---|
0:09:42 | well as it often fails on the all four entity names that is marks g |
---|
0:09:47 | then i'm not eating good data |
---|
0:09:50 | particularly because we don't have |
---|
0:09:52 | after a system you not |
---|
0:09:55 | so this is the first that |
---|
0:09:56 | just collecting |
---|
0:10:00 | the second step is to generate the |
---|
0:10:03 | positive and negative examples okay so the positive example |
---|
0:10:07 | i at least in our money now our approach our initial approach is quite simple |
---|
0:10:12 | all of the old i frames university of it are also conducts |
---|
0:10:16 | okay we downloaded them from a website so we trust the website |
---|
0:10:22 | as for melody that |
---|
0:10:24 | for each |
---|
0:10:26 | well as you can example which have at the and negative example or and number |
---|
0:10:30 | of negative examples that |
---|
0:10:32 | spalling disciplines and the rules |
---|
0:10:34 | okay |
---|
0:10:35 | so for instance at each step sequence of imposing example is in it |
---|
0:10:41 | okay that's |
---|
0:10:42 | that is simple |
---|
0:10:45 | we have the second row and second the |
---|
0:10:49 | perceive you |
---|
0:10:50 | we have a positive example one token a randomly selected from the forced located in |
---|
0:10:58 | the list of the data in the data yes or the last okay |
---|
0:11:02 | okay so we |
---|
0:11:04 | compose a negative the negative examples |
---|
0:11:08 | for instance if we start with the black and white t shirt the |
---|
0:11:12 | okay this is the positive and negative a to all the some sequences play why |
---|
0:11:18 | the black and white black and white and so on but negative but also black |
---|
0:11:24 | and white to show the preceded by as being the randomly selected the from the |
---|
0:11:30 | data |
---|
0:11:31 | i think that in this capacity of a which i've downloaded from the web there |
---|
0:11:35 | is a local noise |
---|
0:11:37 | okay we don't have any control on that we all the vendors a ride the |
---|
0:11:42 | needs of products |
---|
0:11:43 | well |
---|
0:11:44 | make sure that might be completely you know |
---|
0:11:49 | so the second step we generate positive and negative now on the basis of positive |
---|
0:11:54 | and negative we built a more than |
---|
0:11:56 | and you modality |
---|
0:11:58 | so a classifier which is able to say even in a sequence of tokens |
---|
0:12:03 | yes this is a |
---|
0:12:04 | a full that it's a novice is not the full |
---|
0:12:07 | this is the |
---|
0:12:10 | for mature not this is not performance |
---|
0:12:13 | so we was the not really x easy to |
---|
0:12:19 | classifier a so we this is based on a new world model proposed by the |
---|
0:12:25 | lamp holder and the and the others a couple of years ago |
---|
0:12:29 | and uses a kind of a classical l target you know that detector is you |
---|
0:12:36 | know that that's all both a word embeddings that and are active embeddings |
---|
0:12:43 | we have data a few handcrafted feature |
---|
0:12:48 | which we are available on the for this classifier you like the features about the |
---|
0:12:55 | relative to a certain token |
---|
0:12:57 | the position of the token the frequency the length of the to enter |
---|
0:13:02 | you don't probability of a token the and also all the this is the only |
---|
0:13:06 | linguistic information that the we have using all the a part-of-speech for that though okay |
---|
0:13:13 | so without any disambiguation |
---|
0:13:18 | so at the end of this classifier assays yes this is the this is a |
---|
0:13:23 | sequence of cocaine |
---|
0:13:24 | is a multiple |
---|
0:13:27 | a first step in for seven containing |
---|
0:13:31 | and it is a confidence score that so the thing that simple |
---|
0:13:37 | so no we have this classifier |
---|
0:13:40 | okay you mode that we have but we want will but our goal is to |
---|
0:13:44 | recognize and easy and it needs to in a so in that sentences request okay |
---|
0:13:54 | so |
---|
0:13:56 | i think about the this example this is a possible a request that from one |
---|
0:14:02 | user and looking for a building the yellow sure so and that |
---|
0:14:07 | lucia the |
---|
0:14:10 | we need to the classifier although this sequences of these additional request |
---|
0:14:19 | so and the we asked to the classifier to say whether x sub-sequences positive or |
---|
0:14:24 | negative |
---|
0:14:25 | so in this case the |
---|
0:14:26 | a positive will be sure cellular shorts a little bit in the yellow sure to |
---|
0:14:33 | and then i will be i'm looking for a gold in a short and darpa |
---|
0:14:38 | and blind |
---|
0:14:40 | then we train k |
---|
0:14:43 | of the well the or the policy |
---|
0:14:48 | and |
---|
0:14:50 | classified stuff sub-sequences on the base of the confidence to all the new remote |
---|
0:14:56 | okay and we select the rules which have not of them that a simple i |
---|
0:15:01 | so in this may rewrite golden yellow shore so that a short so that loser |
---|
0:15:09 | a this set one that is discarded the because these the overlap between the first |
---|
0:15:14 | one and so we looked at i'm looking for a we will close |
---|
0:15:21 | golden yellow sure so and the data which |
---|
0:15:24 | okay |
---|
0:15:25 | so this is the methodology we want to apply |
---|
0:15:29 | we would like to know how we can will with this is impermissible |
---|
0:15:35 | so we did the some experiment so as aforesaid that we collected the got cynthia |
---|
0:15:41 | set as i mention that |
---|
0:15:42 | now we have a density as for a three domains to the for sure across |
---|
0:15:48 | the two languages english and italian with different characteristics |
---|
0:15:52 | so for each that it here we have number of and it is the number |
---|
0:15:56 | of talking to |
---|
0:15:58 | the lane the and the standard deviation of that okay so the standard deviation a |
---|
0:16:05 | and |
---|
0:16:06 | kind of index that can scroll how much is the complexity albany |
---|
0:16:12 | so the more this contribution to the more likely it is of the complexity of |
---|
0:16:16 | the names university of |
---|
0:16:19 | we have that i do you know ratio so they are is that it indicates |
---|
0:16:24 | a high lexical but at feast or more complexity again |
---|
0:16:29 | a real also added to the what the proportion of time that the first token |
---|
0:16:35 | appears in the position or the name |
---|
0:16:38 | and this may make it is a sum of our educational |
---|
0:16:43 | about the |
---|
0:16:46 | well matched is how much the |
---|
0:16:49 | the |
---|
0:16:50 | semantic a that i need a stable okay |
---|
0:16:54 | and these with different |
---|
0:16:58 | different designs here at like you see that the project i mean and is like |
---|
0:17:05 | to the italian |
---|
0:17:07 | this is and low-level value then the each |
---|
0:17:11 | it means that in time and the first okay and it is usually the hey |
---|
0:17:16 | why this is not the this is not for english |
---|
0:17:21 | and the last the last feature that we want to point see that are easy |
---|
0:17:26 | to how maps |
---|
0:17:29 | and the proportion of anything the that can be in that entity these we give |
---|
0:17:35 | some idea of the |
---|
0:17:37 | compositionality of a of a set thing or something that |
---|
0:17:42 | okay so the moral you the more we can find the with the intent in |
---|
0:17:47 | any and now the name pricerange is a good the more it is it is |
---|
0:17:52 | composition |
---|
0:17:54 | this is the experimental setup the |
---|
0:17:58 | we have the six the comments that the sse domains two languages to just the |
---|
0:18:03 | just mentioned |
---|
0:18:05 | we split each density of it in that way in that case |
---|
0:18:10 | one import company is that there is no |
---|
0:18:14 | anoint at present in the training is present in the text |
---|
0:18:20 | for and it's innovation also negative entity needs to see you later the one information |
---|
0:18:27 | to displease one quality for each for a very positive we generate pruning |
---|
0:18:36 | then test this is important we don't titanium we are that a real test data |
---|
0:18:41 | so you can that has a synthetic it is you know rate |
---|
0:18:45 | okay a start from a number of templates are a little bit more than two |
---|
0:18:50 | hundred the template so both for english and italian |
---|
0:18:56 | a typical in place a do correspond to intensity comments apply templates for selecting a |
---|
0:19:05 | plate for asking description templates for i think it a product of two that will |
---|
0:19:11 | use the |
---|
0:19:12 | like i'm finally we the name of the and whether the name of the entity |
---|
0:19:17 | and the in from it is the |
---|
0:19:20 | that is the a part of the data |
---|
0:19:27 | we have two baselines atlanta's of that a simple rule based to a baseline where |
---|
0:19:32 | a us this sometimes you get |
---|
0:19:35 | and |
---|
0:19:38 | a time to in a certain utterance is recognized as belonging to look at the |
---|
0:19:42 | early eva |
---|
0:19:44 | any all of the all that all kinds of the chunk of present indicative for |
---|
0:19:48 | something typically |
---|
0:19:49 | and then we wanted to test also |
---|
0:19:52 | a new and more data we live in of rollment |
---|
0:19:56 | syntactically generated that screen the |
---|
0:19:59 | okay so we apply the same methodology for generating testing data testing data also for |
---|
0:20:05 | generating synthetic for synthetic data generating the training data |
---|
0:20:13 | this have the result of this of our experiments of the two baseline class of |
---|
0:20:18 | our system and the |
---|
0:20:21 | the last row so we see that the for all our dataset a |
---|
0:20:27 | the system based the convexity of significantly outperforms of the that will be the two |
---|
0:20:33 | baselines |
---|
0:20:34 | which is already i think original result |
---|
0:20:39 | something more about that the problems that this is the more complex or |
---|
0:20:44 | as you can image and as it |
---|
0:20:49 | it amounts from the got it is for the ease has a high us the |
---|
0:20:54 | variability and they have just compositionality both in italian and in english so the results |
---|
0:21:00 | are nowhere among the three the three domains |
---|
0:21:05 | for an issue to is the last compositional so basically they easier |
---|
0:21:10 | it's the one who percent or more tool named entity on a set then project |
---|
0:21:16 | the point of view but actually this is the smaller dataset that we are okay |
---|
0:21:20 | just a few one of its of with respect to |
---|
0:21:25 | save about a thousand one of its for |
---|
0:21:31 | and cruel think is very regular graph and the high composition |
---|
0:21:37 | so here we have a good results |
---|
0:21:42 | okay so just want to compute the |
---|
0:21:46 | so this i where i have reported the some experiments about the other short approach |
---|
0:21:54 | for entity recognition on the web we can see that gravity is only as the |
---|
0:21:59 | only source of information |
---|
0:22:03 | so it does not assume any annotated sentences the training it but also for testing |
---|
0:22:10 | we have generated the syntactically the |
---|
0:22:15 | we focus on a nominal entities because this out of domain entities in the second |
---|
0:22:21 | also for naming the product so |
---|
0:22:24 | and the we |
---|
0:22:27 | the approach to tries to take advantage as much as much as possible or extract |
---|
0:22:32 | noted from density as a particular due to the compositionality all the names or products |
---|
0:22:40 | and the menu of respect this is a very initial work the and the we |
---|
0:22:46 | see that quite a lot a room for improvement |
---|
0:22:50 | three activities are going to for us to |
---|
0:22:56 | the first one is just considering the fact that the state of a column of |
---|
0:23:00 | sequence labeling is improving are actually about daily we have new and approaches the new |
---|
0:23:07 | and more there's a for instance we tried the last the |
---|
0:23:13 | more data the value by my and all the and the this is maybe better |
---|
0:23:19 | than the previous there is a lot of room for experimenting and improve even acknowledges |
---|
0:23:27 | for a generating a synthetic data |
---|
0:23:30 | so we have we experimented with some parameters c one positive to negative but well |
---|
0:23:37 | out that there might be maybe that all of the model setting for these parameters |
---|
0:23:45 | and then of course it might be very interesting to integrate the exact idiots to |
---|
0:23:51 | and he is that we have a some data maybe a little data few data |
---|
0:23:55 | i i'm not take a few sentences and an integrated to |
---|
0:24:00 | what also integrated the guys at a more than what we call and then g |
---|
0:24:04 | we |
---|
0:24:06 | a syntactically anatomy the more than a from an okay the doctor acts |
---|
0:24:14 | so the reason i think about a lot of work of four |
---|
0:24:18 | where the forties to make these approaches as much as possible domain independent soul and |
---|
0:24:26 | be able to move from one domain to another with the same technology and also |
---|
0:24:30 | language independent |
---|
0:24:32 | and you |
---|
0:24:54 | yes sure |
---|
0:24:56 | so templates that are disjoint |
---|
0:24:58 | both entities |
---|
0:25:00 | and templates are disjoint so we try to separate as much as possible |
---|
0:25:06 | training from that's |
---|
0:25:32 | that woman |
---|
0:25:36 | i |
---|
0:25:38 | or maybe it's a good question but i don't think i have any also for |
---|
0:25:42 | the moment so the focus was |
---|
0:25:47 | and to do recognition in basically isolated sentences and the right so i don't have |
---|
0:25:53 | any |
---|
0:25:55 | so these are asked to be probably c and consider data even a and a |
---|
0:26:02 | broad three more of a dialogue system |
---|
0:26:06 | actually a this work is closer to traditional information extraction then |
---|
0:26:12 | problem so we have no still there |
---|
0:26:21 | sorry i think i to the possible |
---|
0:26:27 | not all even that will the word embedded the word vectors so i'm generated the |
---|
0:26:34 | front cavity |
---|
0:26:35 | that is it's a good point so we don't vector or server for all mixing |
---|
0:26:41 | we can be or other stuff everything is generated from a density |
---|
0:26:48 | so this is the only source of information that |
---|