0:00:17 | so the first present there is a man you know so |
---|
0:00:19 | these start you presentation |
---|
0:00:22 | good after don't know to one |
---|
0:00:24 | so my name is manner thus generally amount of furniture from the interaction lab |
---|
0:00:29 | of they headed for university and then gonna present work have done we don't have |
---|
0:00:34 | an so an oliver lemon |
---|
0:00:36 | about a docking outmoded task natural language understanding system for cross domain conversationally i that |
---|
0:00:42 | we call and meet nlu |
---|
0:00:45 | so and another language understanding is quite a white concept |
---|
0:00:50 | a most of the time when is about compositionally i a dialogue system it of |
---|
0:00:54 | us to the process of extracting the meeting from natural language and providing key to |
---|
0:00:58 | the dialogue system in a structured way so that the dialogue system can perform definitely |
---|
0:01:03 | better |
---|
0:01:04 | and we begin end up |
---|
0:01:07 | study studying this problem is for the sake of it but actually |
---|
0:01:10 | we did it in the context of the moment project which will see as you |
---|
0:01:14 | to be project that was about |
---|
0:01:16 | at the deployment of a robot with the |
---|
0:01:18 | multimodal interaction capability it was supposed to be deployed in a shopping one thing around |
---|
0:01:23 | and it was supposed to interact with the user's a giving them a structure entertaining |
---|
0:01:27 | them would only be little bit of chit chatting |
---|
0:01:29 | and i'm gonna show a video of it that may be explained it be better |
---|
0:01:33 | what the robot was supposed to do |
---|
0:01:35 | help you can hear the audio although they don't the subtitles |
---|
0:01:45 | i dunno one of the recording |
---|
0:01:50 | so the robot with both i sent and if no indication we just the and |
---|
0:02:00 | voice |
---|
0:02:01 | in this five phase |
---|
0:02:03 | and with or without the backing being detriment and the preference of the user |
---|
0:02:09 | right |
---|
0:02:16 | one value no straight i actually and no not attacking |
---|
0:02:32 | but for some with of the next to |
---|
0:02:35 | so we so a lot of generation but everything started with a request from the |
---|
0:02:38 | user |
---|
0:02:39 | and that's the mute one where we are focusing today so is basically designing an |
---|
0:02:45 | nlu component of with a robust enough to work and is very complex dialogue move |
---|
0:02:49 | to model dialogue system |
---|
0:02:52 | again most often in compositionally i |
---|
0:02:56 | not a language understanding is a synonym of shallow semantic parsing so this can actually |
---|
0:03:00 | the beat with the next to the |
---|
0:03:02 | morning keynote and which is the process of extracting some frame an argument structure |
---|
0:03:08 | that completely meaning in a sentence and it doesn't really matter how we call them |
---|
0:03:12 | if is intent of slot |
---|
0:03:13 | well and most of the time this types are defined according to |
---|
0:03:17 | the application domain |
---|
0:03:18 | whether they have a system two db i'm like framesemantic switched off and isolate of |
---|
0:03:22 | abstraction and is the one we are using in our context |
---|
0:03:26 | but actually some problems especially in our case when we wanted to be then interface |
---|
0:03:30 | there was able to but using several different domains while most of the time |
---|
0:03:35 | in dialogue system when you have another language understanding component they always did we must |
---|
0:03:39 | single domain or |
---|
0:03:41 | if you don't through domains at the same time |
---|
0:03:44 | and this also |
---|
0:03:44 | what because |
---|
0:03:45 | the resources are available the are always or about so looking restaurants so booking flights |
---|
0:03:51 | while we wanted our interface to be use them in several different location that can |
---|
0:03:55 | be in a domestic environmental rights of the shopping mall or in sin for example |
---|
0:04:00 | why you have to command robot |
---|
0:04:02 | formant in unseen offshore all drinks |
---|
0:04:04 | and so |
---|
0:04:05 | one of the first problem want to the system to be the system that was |
---|
0:04:08 | cross domain |
---|
0:04:09 | and even if there may be noted see a recipe for that we what trying |
---|
0:04:13 | to this problem anyway |
---|
0:04:16 | and the big problem is that |
---|
0:04:17 | most of the time dependencies into that are designed i you for dialogue system error |
---|
0:04:22 | only contain a single intent or frame |
---|
0:04:25 | while in our case there are many sentences that given to the robot |
---|
0:04:29 | which contains two different free more intense and four as can be very important to |
---|
0:04:35 | a detect both of them because if we ignore the temporal relation between these two |
---|
0:04:41 | different frames for every important to you know satisfy the user both for the codec |
---|
0:04:46 | a mess by action and also the needing of a pole at the same time |
---|
0:04:50 | so that's another problem that when you rely on these |
---|
0:04:54 | hi you know the and structure |
---|
0:04:57 | most of the time |
---|
0:04:58 | two different kind of interaction might end up being the exact same intent or frame |
---|
0:05:03 | like in this case while the actually belong in the dialogue |
---|
0:05:06 | two different kind of interaction so what we actually wanted to do is not only |
---|
0:05:10 | targeting the frame and en |
---|
0:05:13 | and the slots |
---|
0:05:14 | but also wanting a layer of dialogue acts they will tell the dialogue system |
---|
0:05:18 | the context in which these are has been said so for example in the first |
---|
0:05:21 | case we are informing the robot's that starbucks next on the all imagine that we |
---|
0:05:24 | want to teach the robot how the shopping mall is done and the second one |
---|
0:05:28 | days at a customer that is ask asking a an information about the location |
---|
0:05:32 | all starbucks |
---|
0:05:33 | so in two |
---|
0:05:35 | quickly to cup we wanted to deal with different domain of the same time if |
---|
0:05:39 | possible |
---|
0:05:40 | we wanted to talk more than one single intent and arguments |
---|
0:05:44 | the sentence and since we are also during the dialogue act so we have a |
---|
0:05:48 | moody task i could that share |
---|
0:05:49 | we have to deal also we multiple dialogue act |
---|
0:05:52 | we might argue why the |
---|
0:05:54 | is actually very important to understand both the dialogue act in this case |
---|
0:05:58 | if not the final intent is only to give information about the location of starbucks |
---|
0:06:03 | but actually we might want also to understand why |
---|
0:06:06 | the user is asking for starbucks because we need a coffee if maybe was meeting |
---|
0:06:09 | and meet shaken does not starbucks you could do could have pointed it somewhere else |
---|
0:06:13 | so far have this stuff is real important |
---|
0:06:16 | and of course |
---|
0:06:17 | we wanted to try to benchmark of and the you system a initiatives |
---|
0:06:24 | and eye gaze to off-the-shelf tools in this was given by the people are there |
---|
0:06:28 | was actually |
---|
0:06:29 | providing us with these utterances and evaluations and we will see later |
---|
0:06:34 | note the very quickly i mean is nothing complicated we tried with this |
---|
0:06:39 | this problem by |
---|
0:06:40 | addressing the three different task |
---|
0:06:42 | at the same time so this asks another of locating dialogue acts the frame |
---|
0:06:48 | and the arguments |
---|
0:06:50 | each task was solve the with a sequence labeling approach in which we were giving |
---|
0:06:55 | and label to each token of the sentence is |
---|
0:06:57 | something very common in nlp |
---|
0:07:00 | and each label was actually composed by the class |
---|
0:07:03 | of the structured we were able to target for a given task |
---|
0:07:08 | enriched with the label that can be o i o |
---|
0:07:12 | depending well |
---|
0:07:13 | the and the type was the beginning of a span of a structure they inside |
---|
0:07:18 | or was outside one of these and here we have a very easy example |
---|
0:07:21 | now the problem is that |
---|
0:07:23 | this is a linear solution for a problem which is |
---|
0:07:26 | and i gotta save because the language is a gaussian then we might end up |
---|
0:07:29 | having some structure which set actually nested inside other structure especially for freeman arguments this |
---|
0:07:35 | doesn't happen that basically never for dialogue acts |
---|
0:07:39 | but for frame and arguments this is happens quite of an especially in the data |
---|
0:07:44 | we collected |
---|
0:07:45 | so what we that was solutions kit was to |
---|
0:07:48 | basically collapse |
---|
0:07:49 | the just actual in a single linear selection and trying to get whether one of |
---|
0:07:53 | this structure |
---|
0:07:54 | was actually inside |
---|
0:07:56 | a previously target that one |
---|
0:07:58 | by using some realistic on the syntactic relation among the words of an example if |
---|
0:08:02 | find was actually |
---|
0:08:04 | syntactic child of two |
---|
0:08:06 | we could but usually sticks a by some roots actually say what that the locating |
---|
0:08:11 | nh frame was actually a embedded inside the requirement argument of the needing frame |
---|
0:08:18 | now there has been solved in a multitask fashion so we basically generate them created |
---|
0:08:23 | a single network that was dealing with that the ti in task at the same |
---|
0:08:26 | time is basically other sequence of stick with the t within quadrants yet if that |
---|
0:08:31 | is that i'm gonna show |
---|
0:08:32 | next slide is nothing but the only complicated but there are two main reason why |
---|
0:08:37 | we adopt the d is |
---|
0:08:39 | architecture first of all we wanted more or less to replicate |
---|
0:08:42 | and yet a key of |
---|
0:08:44 | and task difficulty in a sense that we were assuming actually we were |
---|
0:08:48 | not the think that the tagging they'll that backs is easier than typing frames any |
---|
0:08:52 | it easy if the target frame t v then tagging arguments |
---|
0:08:56 | and that's also |
---|
0:08:57 | i kind of structural relationship between you do it between these three because many times |
---|
0:09:00 | some frames tend to appear model friend in the context of some dialogue acts and |
---|
0:09:05 | arguments are almost always dependent on and frames |
---|
0:09:09 | extra especially when there is a strong to be i'm like from semantics |
---|
0:09:12 | and |
---|
0:09:13 | so this is these are the reason why the network is down like this |
---|
0:09:17 | and i'm going to illustrate the network quite quickly because this is a little bit |
---|
0:09:21 | more |
---|
0:09:22 | technical stuff so |
---|
0:09:24 | the input file a network with only a pretty and then one betting that we |
---|
0:09:27 | were not be training and that with the firstly there was encoding with a step |
---|
0:09:32 | of encoded with some set potentially there was supposed to capture |
---|
0:09:36 | some relationship that the bidirectional lstm encoder was in capturing because he wouldn't sometimes of |
---|
0:09:42 | attention is more able to capture relationship among words which are quite distant in the |
---|
0:09:47 | sentence |
---|
0:09:48 | and then we were feeding us yet if layer |
---|
0:09:51 | there was actually typing the sequence of four by your tags for the dialogue act |
---|
0:09:56 | in a right of the this of attention delay |
---|
0:10:00 | so for the frames it was basically the same thing |
---|
0:10:04 | but we were |
---|
0:10:06 | using shot recognition before because we wanted to provide encoded with the fresh information |
---|
0:10:11 | from the first layer so actually the lexical information but also |
---|
0:10:16 | which some information that was encoded while |
---|
0:10:18 | being it |
---|
0:10:19 | kind of i and directly being a condition on what the |
---|
0:10:23 | the dialogue act was starting so we were putting the information together and with serving |
---|
0:10:28 | the information to the next layer |
---|
0:10:30 | and the with a crf for typing of before |
---|
0:10:32 | and finally for the arguments whether again the same thing |
---|
0:10:36 | another step of encoding and crf layer with lots of attention and these came up |
---|
0:10:40 | from the experiments we have done with some ablation study it is on the p |
---|
0:10:44 | but we're another button you hear about this is the final network we manage to |
---|
0:10:49 | tune at the very end |
---|
0:10:51 | so in either was think at the beginning we wanted to |
---|
0:10:57 | benchmark this |
---|
0:10:59 | these nlu |
---|
0:11:01 | components now benchmarking and nlu for the system is quite of a big issue in |
---|
0:11:05 | a sense that the dataset and that was thing before most of these are that |
---|
0:11:10 | are quite |
---|
0:11:12 | single domain |
---|
0:11:13 | and then very few stuff |
---|
0:11:15 | i mean about an hour now that there are some doubt that direct |
---|
0:11:18 | the started go popping up but the beginning of this year we were still put |
---|
0:11:22 | on that side |
---|
0:11:24 | by likely that was these results which is score the nlu benchmark |
---|
0:11:29 | which is a bicycle cross domain corpus of hundred interaction with the house assistant the |
---|
0:11:33 | robot |
---|
0:11:34 | is mostly i or orient that is not a collection of dialogue is the only |
---|
0:11:38 | single interaction utterance interaction we with the system |
---|
0:11:42 | and callers a lot of the mean we will see later |
---|
0:11:45 | and but is mostly not oriented there are some |
---|
0:11:50 | a comments that can be used for a robot bodies mostly again i go to |
---|
0:11:53 | oriented |
---|
0:11:53 | what does a second rest of that we started collecting along the esn is taking |
---|
0:11:58 | a lot of time |
---|
0:11:59 | which is the rubber score was a is called the is like that because we |
---|
0:12:03 | stand for robotics oriented mostly task language understanding corpus |
---|
0:12:07 | and is again is a collection of single interaction with the robot that called a |
---|
0:12:12 | different domains that more think them of kind of interaction there is there is to |
---|
0:12:16 | chopping that is |
---|
0:12:17 | is state common the robot's there is a also a lot of information you can |
---|
0:12:21 | give to the robot about completion of the environmental name of both on |
---|
0:12:25 | well this kind of tough |
---|
0:12:26 | that's quite a huge overlap between the two in terms of kind of interaction |
---|
0:12:30 | but they spun on |
---|
0:12:32 | different domains |
---|
0:12:33 | so |
---|
0:12:35 | the first corpus the nlu benchmark provide us three different semantically yes |
---|
0:12:41 | and their code scenario action an entity i know this sounds completely different of |
---|
0:12:44 | from what we said before but we had to find some mappings with the stuff |
---|
0:12:48 | we where we wanted to that are go over the sentences |
---|
0:12:52 | the robot is good big the full set of it is twenty five almost twenty |
---|
0:12:57 | six thousand percent sentences |
---|
0:13:00 | and there are agent different this scenario types and each scenario busy a domain |
---|
0:13:05 | and that of the fifty four different action types and fifty six different entities |
---|
0:13:11 | there is something the goal and intent which is basically the sum up of scenario |
---|
0:13:15 | plus action and this is important for the model for the evaluation will see later |
---|
0:13:20 | as you can see there is a problem with this the dataset is that is |
---|
0:13:24 | that it is gonna cost domain |
---|
0:13:26 | is that it is more t task because we have three different semantic layer |
---|
0:13:29 | but |
---|
0:13:30 | we have always one single send audio and actions so one single intent per sentence |
---|
0:13:35 | so what we could benchmark on these it |
---|
0:13:38 | corpus was mostly these two initial |
---|
0:13:42 | these two initial factors |
---|
0:13:45 | we did evaluation according to the paper that was presenting |
---|
0:13:49 | the benchmark |
---|
0:13:50 | and this was done on a ten fold cross validation with like half of the |
---|
0:13:53 | sentences that eleven off of the sentences in this was to balance |
---|
0:13:56 | the number of classes and it is inside the on the results |
---|
0:14:02 | so i that was saying that we had to do a mapping |
---|
0:14:05 | between |
---|
0:14:06 | their tagging scheme and whatever we wanted to die which is very general approach for |
---|
0:14:11 | extracting the semantics from sentences in the context of a dialogue system |
---|
0:14:16 | bum we also so that |
---|
0:14:18 | the kind of relationship that what holding between |
---|
0:14:20 | they are semantically at one or more or less the same there were holding for |
---|
0:14:24 | our approach |
---|
0:14:26 | and so these at some result |
---|
0:14:28 | this is that are reported in the be but there are quite old in a |
---|
0:14:31 | sense that they are from the beginning of this the they've been evaluated in two |
---|
0:14:34 | thousand eighteen |
---|
0:14:35 | they have been around on all the open source |
---|
0:14:39 | reduction of these that nlu component of dialogue system available of the shots |
---|
0:14:46 | that's a problem we want some because you know why second specific training for entities |
---|
0:14:50 | and these was not possible because it does a constraint on the number |
---|
0:14:56 | of entity types and ended example you can pass do we do we try to |
---|
0:15:00 | talk with what some people but we didn't manage to get the licensed at least |
---|
0:15:03 | to run a one training with the full set of things so do you have |
---|
0:15:08 | to take that into account too much unfortunately |
---|
0:15:11 | the intent that was think is the sum up of the scenario |
---|
0:15:14 | and an action |
---|
0:15:16 | and these |
---|
0:15:19 | performance is then |
---|
0:15:21 | obtain it on ten fold cross validation i didn't about the standard deviation because |
---|
0:15:25 | it would they were almost all stable but if you want to look at them |
---|
0:15:28 | they're on the paper |
---|
0:15:29 | and the other important thing is that we want to take into account whether it's |
---|
0:15:34 | upon |
---|
0:15:35 | of a target structure to was matching exactly actually |
---|
0:15:39 | the elders of the people when in taking into account that |
---|
0:15:41 | but they got the true positive whether there was a and an overlap |
---|
0:15:45 | an overlap of the of the spun |
---|
0:15:48 | so these are kind of a lose metric |
---|
0:15:50 | that we whatever we are evaluating one |
---|
0:15:52 | we can see that the entity for the entity and then the combined setting a |
---|
0:15:57 | our system was the performing on average better than the other while for the intent |
---|
0:16:01 | we will actually not performing as what is what some but better than the other |
---|
0:16:06 | two system |
---|
0:16:07 | the other important bit is that the combined the |
---|
0:16:11 | measure is actually the sum up of the two confusion matrix of intents and entities |
---|
0:16:15 | are we doesn't |
---|
0:16:16 | actually give us anything about the pipeline |
---|
0:16:18 | our the full pipeline is working |
---|
0:16:20 | but these a something that we have done |
---|
0:16:22 | on our corpus which is much smaller |
---|
0:16:25 | and is not yet available because that we are still gathering data |
---|
0:16:29 | probably end of this year we're gonna release it |
---|
0:16:32 | i know if you colours are very natural environment but for people doing a chair |
---|
0:16:37 | are your dialogue in the context of robotics this can be |
---|
0:16:39 | one interesting |
---|
0:16:42 | so here we have eleven dialogue types and fifty eight frame types |
---|
0:16:46 | which compared to the number of example is quite high |
---|
0:16:49 | and eighty four frame element types of which are the arguments |
---|
0:16:52 | and if you can see |
---|
0:16:54 | not always but there are many cases in which will we have more than one |
---|
0:16:58 | frame per sentence and what them more than one that about but sentence |
---|
0:17:01 | and no idea the frame elements are quite a lot |
---|
0:17:07 | we i have like |
---|
0:17:09 | they fit into semantic space body into these three is more formally the only tool |
---|
0:17:13 | because |
---|
0:17:13 | we have thirteen dialogue acts exactly like we so during the in the rest of |
---|
0:17:16 | the presentation |
---|
0:17:17 | and we also provide semantics in a them in terms of frame semantics |
---|
0:17:22 | well we have three main frame elements these are actually this the same the same |
---|
0:17:25 | semantic layer theoretically but there are two different layers or variational e |
---|
0:17:30 | and if you can see we have a lot of four |
---|
0:17:32 | embedded structure a frame inside on the frame and this kind of stuff |
---|
0:17:36 | a this is the mapping we had to do again |
---|
0:17:39 | with the different semantic layer is basically same dialogue acts dialogue acts frames and frames |
---|
0:17:43 | and frame element some arguments |
---|
0:17:46 | and of course |
---|
0:17:47 | the these are the two aspect that we could tackle why using this corpus so |
---|
0:17:51 | is not incur of domain because he's not a score of the mean of the |
---|
0:17:54 | other one |
---|
0:17:54 | it is enough to have that we have |
---|
0:17:56 | different kind of interaction and we have also sentences coming |
---|
0:17:59 | from two to different scenarios that can be |
---|
0:18:03 | the house scenario and the shopping mall scenario jealousy charting something coming from these interaction |
---|
0:18:09 | with the month in answer about |
---|
0:18:12 | but we don't want to sell it is completely closed domain mostly because the other |
---|
0:18:17 | record with a much more of the mean than this one |
---|
0:18:19 | but it every multi task and is there really moody dialogue at frame on each |
---|
0:18:23 | sentence |
---|
0:18:24 | and k that is out of |
---|
0:18:27 | the might look quite we hear the about |
---|
0:18:29 | i'm gonna explain why the like this |
---|
0:18:31 | so most that's one i report here is the same exact measure that was reporting |
---|
0:18:36 | for the nh the nlu benchmark so |
---|
0:18:38 | we have take into account only when the span |
---|
0:18:40 | of to structure the overlap okay |
---|
0:18:43 | and |
---|
0:18:43 | the results are quite high |
---|
0:18:45 | and the main reason is that to the corpus is not been delexicalised |
---|
0:18:49 | so there are sentences are quite similar |
---|
0:18:52 | and then the system be a very well |
---|
0:18:53 | but you don't have to get parts of by doubt because |
---|
0:18:56 | if we look at the last one could be the second one is basically only |
---|
0:18:59 | using the |
---|
0:19:00 | the coal two thousand set of task evaluation which is a standard and we report |
---|
0:19:05 | the need for general comparison with other system |
---|
0:19:07 | but the most important one is the last one with a that is the exact |
---|
0:19:11 | match |
---|
0:19:11 | and the laughter of the exact match is telling us |
---|
0:19:14 | how well the system over the pipeline with working completely so we were taking into |
---|
0:19:18 | account the exact span |
---|
0:19:21 | of |
---|
0:19:23 | all of the target structure |
---|
0:19:24 | and also |
---|
0:19:25 | we were |
---|
0:19:26 | yes we were |
---|
0:19:30 | we were actually |
---|
0:19:31 | trying to get |
---|
0:19:32 | i mean a frame was actually correctly dog only if the also the dialogue that |
---|
0:19:36 | what's quality data so with actually the end-to-end system |
---|
0:19:39 | in a pipeline and that is |
---|
0:19:40 | the measure we have to chase |
---|
0:19:43 | no two |
---|
0:19:45 | conclude and some future work so the system that i presented which is these their |
---|
0:19:49 | cross domain moody task |
---|
0:19:52 | and that you system for not a language understanding to |
---|
0:19:55 | for conversational i a that we designed a is actually running in the shopping mall |
---|
0:20:01 | you feel on |
---|
0:20:03 | the video i showed you was formed from the deployment we have done |
---|
0:20:07 | and is gonna be derived for three months in a role |
---|
0:20:09 | some pos during the weekend to do some matter out easy vendors rebooting the system |
---|
0:20:13 | but we |
---|
0:20:14 | manage to collect a lot of the time order maybe integrate them in the corpus |
---|
0:20:17 | and release it and of this year |
---|
0:20:19 | if we manage to back them properly into the checking only the latest beginning of |
---|
0:20:23 | next year |
---|
0:20:25 | we have to deal with their this area with different a demon sad this |
---|
0:20:28 | it means not relying on these heuristic on the syntactic structure but actually simultaneous most |
---|
0:20:33 | honestly starting |
---|
0:20:35 | in but that's sequences are moved event sequence e the canopy one inside the other |
---|
0:20:38 | if any topic because we actually already of this system we |
---|
0:20:42 | finally the final added few months ago so we didn't have time to the meeting |
---|
0:20:45 | here but these exist and then there is a branch on that everybody the ti |
---|
0:20:50 | show you which is about this new system |
---|
0:20:55 | but of our work is |
---|
0:20:56 | this one of generating a general framework for frame neck structure so it doesn't |
---|
0:21:01 | method it's you audio the application that is the reason behind |
---|
0:21:04 | we are trying to create a network that can be with all the possible frame |
---|
0:21:08 | like structure passing this is our a long-term goal something very big but we are |
---|
0:21:13 | actually pushing for that |
---|
0:21:14 | and the last bit is mostly dealing with this special tagging of segment that a |
---|
0:21:19 | segmented utterances we are like that in our corpus there were many |
---|
0:21:23 | small bit of sentence that the user with one thing because they were stopping you |
---|
0:21:27 | the basic dating so the missing the first part of the sentence like i would |
---|
0:21:30 | like to |
---|
0:21:31 | and there's asr what actually this equation is that was sending the thing to the |
---|
0:21:36 | bus set and the bus to work correctly by think it by the with some |
---|
0:21:39 | bit missing |
---|
0:21:40 | now when the user with thing |
---|
0:21:42 | to find the starbucks for example we receiving these find the starbucks there was contextualize |
---|
0:21:47 | the as a fine finding locating frame |
---|
0:21:50 | but we didn't know it was also a frame element of the previous |
---|
0:21:53 | structured so we are studying the way to |
---|
0:21:55 | make the system aware of what has been part before |
---|
0:21:58 | so that you can actually give more info what information in the context of the |
---|
0:22:02 | same utterance even if these broken by idea is to |
---|
0:22:05 | and |
---|
0:22:06 | this is everything |
---|
0:22:07 | okay thanks very much |
---|
0:22:13 | okay so that's it's time for questions |
---|
0:22:23 | no him |
---|
0:22:30 | hi and thanks to the rate talk and always good to see rows of being |
---|
0:22:34 | benchmark i'm just curious did you use i just default out of the box parameters |
---|
0:22:38 | the did you do but it during |
---|
0:22:40 | so i we just with the results from the people of the benchmark and they |
---|
0:22:45 | were only saying that the |
---|
0:22:48 | something like a little bit of the and specific training and would for the end |
---|
0:22:51 | it is something like that |
---|
0:22:54 | and bumper for and they use the version |
---|
0:22:57 | there was to using the crf and not the narrow one and a tensor for |
---|
0:23:01 | one okay so that's actually like a very basic version i suppose |
---|
0:23:08 | questions |
---|
0:23:09 | okay |
---|
0:23:12 | so he showed the architecture their with some intermediate layers also be serious are they |
---|
0:23:18 | also into me just supervision here |
---|
0:23:21 | thirty one so this labels via alarm and sonar they also |
---|
0:23:25 | supervised labels used as you know that is all the supervised parts of the five |
---|
0:23:29 | multitasking in this sense that we are solving the three task at the same time |
---|
0:23:32 | so you need |
---|
0:23:34 | slightly more complicated data set for that to have all of that supervised |
---|
0:23:38 | while we have more labels than just and |
---|
0:23:41 | we need to the dialogue act in this case what are the scenarios we need |
---|
0:23:44 | the egg the actions and the frame and their the arguments basically so that's why |
---|
0:23:49 | the data vectors is called the moody does because we have this three layers okay |
---|
0:23:53 | but for a c was really important to different seed we didn't action and dialogue |
---|
0:23:57 | acts because have a show you |
---|
0:23:58 | it will many cases in which it was important for the robot to have a |
---|
0:24:02 | better idea of what was going on in the single sentence okay |
---|
0:24:06 | okay |
---|
0:24:10 | thanks for talking a question in the last slide you mentioned it's a frame like |
---|
0:24:15 | so what's the difference between four and like on the framenet |
---|
0:24:19 | a frame like so unlike what if a to whatever can be |
---|
0:24:25 | mm someone is the enough traction which represent a predication in a sentence and have |
---|
0:24:30 | some arguments |
---|
0:24:32 | this is like the general frame like you know like the very |
---|
0:24:35 | bold |
---|
0:24:35 | it's the same as the frame that's so the data was this decision making the |
---|
0:24:39 | same that big difference is that frame at the very busy fight ut behind |
---|
0:24:43 | and that there are some extra two d is the most things like some relationship |
---|
0:24:47 | between frames and the results of special frame elements like at the lexical unit itself |
---|
0:24:51 | which make it easier to look at the frame in the sentence |
---|
0:24:54 | but |
---|
0:24:55 | what we like to do is it doesn't matter where e framenet thirty five just |
---|
0:24:58 | in time slot like from the i-th this corpus or any other corpus |
---|
0:25:02 | wait like to i'm we are trying to build the is a shallow semantic by |
---|
0:25:07 | so they can deal with all this stuff of the same time |
---|
0:25:09 | as better as possible is if a kind of map task but we have trying |
---|
0:25:13 | to incorporate these different |
---|
0:25:14 | aspects of the ut is then we have trying to deal with them |
---|
0:25:17 | more or less that in different ways but without compromising |
---|
0:25:21 | the assistive led to all their kind of formant |
---|
0:25:24 | one other question with us what to that used for data annotation |
---|
0:25:29 | so we actually had to for our corpus we had to develop already interface |
---|
0:25:34 | is always nice basically a web interface where we have all the token i sentence |
---|
0:25:39 | and we can talk everything on that and the score was as be entirely i |
---|
0:25:45 | mean something with been collecting in the last we have then it takes a long |
---|
0:25:48 | time ago it's a it's |
---|
0:25:51 | it is a hard task to collect these sentences and also we have to filter |
---|
0:25:54 | out many of them because the context of the most different i sometimes we went |
---|
0:25:59 | to the rubber gap to do this collection and of a lot of noise and |
---|
0:26:02 | things we were also value that you're |
---|
0:26:05 | file of these then we stopped but in the and we were always applying some |
---|
0:26:09 | people from all alarm |
---|
0:26:11 | to annotate them like to three of them then you know doing some unintended beam |
---|
0:26:14 | and annotation trying to get whether the actual understood out that but with working if |
---|
0:26:18 | a very long process okay and |
---|
0:26:21 | we're the computational linguist but opposite thing point so |
---|
0:26:24 | it is very hard but this that's |
---|
0:26:29 | that's that the situation with the corpus |
---|
0:26:32 | okay so we have run time so it's not speak again |
---|
0:26:36 | okay |
---|