0:00:15 | that's right tree full column and weakness migrated ones are introduced |
---|
0:00:22 | we use word from a distance from time spectrum modeling one recognition |
---|
0:00:31 | and she's also of interest can |
---|
0:00:33 | that's what they should trust |
---|
0:00:36 | a huge |
---|
0:00:40 | you see over you know trying to you is rover bachelor's and master's |
---|
0:00:47 | operations research and industrial engineering |
---|
0:00:51 | no you can do not one which passes spoken by what sort of quite a |
---|
0:00:58 | long time and the |
---|
0:01:00 | i'm happy to be able to introduce are also your colleague of solution is to |
---|
0:01:07 | open laboratories with really and to mention risk |
---|
0:01:11 | so much closer to speak about interpreting spoken referring expressions empirical studies |
---|
0:01:18 | right |
---|
0:01:19 | and have thank you |
---|
0:01:22 | good morning |
---|
0:01:25 | and things for having here |
---|
0:01:29 | i will be don't know how down there |
---|
0:01:32 | challenges that ice for interpreting spoken referring expressions in physical setting |
---|
0:01:39 | i will be grabbing the presentation in my own icsi the system but they don't |
---|
0:01:47 | and yesterday to where some challenges mentioned already so why are we all of the |
---|
0:01:54 | end of some of my |
---|
0:01:59 | so |
---|
0:02:00 | this is the three |
---|
0:02:03 | well above the dream in nineteen sixty two |
---|
0:02:07 | and the for those of you more for the jetsons |
---|
0:02:12 | and the dream was okay may example there |
---|
0:02:18 | there we have to be these days |
---|
0:02:20 | he's actually better than the green |
---|
0:02:25 | actually because the woman in presence of |
---|
0:02:30 | and i don't know if adding more actually achieve the conversational capabilities that we want |
---|
0:02:36 | to but i |
---|
0:02:38 | if move |
---|
0:02:39 | like every are it will be achieved |
---|
0:02:44 | so |
---|
0:02:45 | one of the channel is |
---|
0:02:47 | so and that's a little |
---|
0:02:50 | and i do anything but for their share that computers the robot or something think |
---|
0:02:56 | be reasoned say that on the code rate of the but they have like resampling |
---|
0:03:04 | and the message result may still day |
---|
0:03:08 | it because if you are in there is a reasonable in there is anything k |
---|
0:03:13 | engine their appropriate for us |
---|
0:03:16 | and what exactly trust probably just |
---|
0:03:19 | you know when to what we need and you know not |
---|
0:03:26 | in each okay |
---|
0:03:28 | so you have different one interaction is in it that |
---|
0:03:34 | so how this is a fixed |
---|
0:03:36 | that i got challenges of first of all evaluation |
---|
0:03:42 | we might be able to provide policies and sorted they actually |
---|
0:03:46 | we thank you challenge |
---|
0:03:49 | i read a novel |
---|
0:03:51 | we don't trust |
---|
0:03:55 | in addition from a game theoretic point of view |
---|
0:04:00 | these are i |
---|
0:04:02 | five favourite challenge is |
---|
0:04:05 | in addition of questions yesterday |
---|
0:04:08 | so all we need to be able to deal perceptual complexity |
---|
0:04:13 | and i will illustrate shortly the to these challenges |
---|
0:04:19 | we need to be able to be with linguistic phenomena such as signal addressee and |
---|
0:04:24 | you would be |
---|
0:04:26 | but it's not gonna see it is not just asr error |
---|
0:04:30 | but also position error or |
---|
0:04:35 | several papers yesterday discuss the thai patient |
---|
0:04:40 | and finally we need to integrate directly probably the i-th knowing about something may help |
---|
0:04:47 | you figure |
---|
0:04:48 | i |
---|
0:04:50 | so noticeable for perceptual complexity |
---|
0:04:54 | so well i |
---|
0:04:58 | so |
---|
0:04:59 | i see |
---|
0:05:02 | by the way this is that one and one female prime minister |
---|
0:05:06 | we have ueller |
---|
0:05:08 | from the by |
---|
0:05:10 | handy |
---|
0:05:11 | that is the difference in the training right flowers and the right |
---|
0:05:17 | but the lexical when you talk about three vowels is actually more security |
---|
0:05:23 | so that has to be that we that's where |
---|
0:05:28 | what are talking about i |
---|
0:05:32 | we can talk about a large a the small a |
---|
0:05:37 | there are a factor in smaller than this more bass |
---|
0:05:41 | so sizes because either in context |
---|
0:05:48 | no |
---|
0:05:49 | in addition we gave topological relation which are spatial relations |
---|
0:05:56 | well carolina |
---|
0:05:59 | so in this example the oranges |
---|
0:06:03 | and the ball |
---|
0:06:05 | and or infeasible |
---|
0:06:08 | no even day |
---|
0:06:11 | okay on the left the position |
---|
0:06:14 | the one |
---|
0:06:16 | or just one |
---|
0:06:20 | no okay |
---|
0:06:21 | in the |
---|
0:06:22 | in the okay |
---|
0:06:24 | the orange the scale in the bowl |
---|
0:06:28 | but in the okay |
---|
0:06:31 | on this i |
---|
0:06:32 | the orders is null |
---|
0:06:35 | thank |
---|
0:06:38 | a |
---|
0:06:40 | you want to say the origins in the old even though it's not that well |
---|
0:06:45 | and the explanation the psychological explanation is that is related to one for |
---|
0:06:52 | if you move the ball we wouldn't the order |
---|
0:06:56 | but you know it humidity calculation the audience is not in the water |
---|
0:07:05 | on |
---|
0:07:06 | so in this wow |
---|
0:07:09 | i is very clear global or and here the plan on the wall but |
---|
0:07:15 | horizontally on the war ok |
---|
0:07:17 | a picture |
---|
0:07:22 | now we have also a project each relation |
---|
0:07:25 | which a particular direction from a landmark |
---|
0:07:30 | so we have a dc you're still far from being too |
---|
0:07:34 | and the last but back to the right of the day |
---|
0:07:39 | we try to see you also directly |
---|
0:07:44 | so it's another |
---|
0:07:46 | i tend to congregate |
---|
0:07:49 | okay that can referring expressions |
---|
0:07:53 | no from point of view of linguistic phenomena |
---|
0:07:56 | we have enough data c |
---|
0:07:59 | i mean i |
---|
0:08:01 | well they want a thread and the reward is more |
---|
0:08:05 | it was to do it sort of teen |
---|
0:08:08 | we have on you know anybody will be in a |
---|
0:08:13 | additional with |
---|
0:08:15 | in the to the problem that prepositional phrases |
---|
0:08:21 | so we have |
---|
0:08:24 | the |
---|
0:08:24 | a few e |
---|
0:08:27 | because we don't know if the back to the lack of the side of you |
---|
0:08:32 | know the plan the lamb |
---|
0:08:33 | but not as shown in our case we have |
---|
0:08:38 | which more |
---|
0:08:41 | i do you get it will |
---|
0:08:43 | even if you identify all the possible and you need at the end of the |
---|
0:08:47 | day it doesn't matter because there is only one flower however |
---|
0:08:52 | this is not the case in this example |
---|
0:08:56 | well |
---|
0:08:57 | in the case |
---|
0:09:02 | it's the table that's near the lack what is your the flat or near that |
---|
0:09:06 | this is to be |
---|
0:09:09 | and yes people do that |
---|
0:09:14 | asr error or out-of-vocabulary words |
---|
0:09:18 | so all of these are |
---|
0:09:20 | someone manufactured example |
---|
0:09:24 | it is not entirely vol all the flower on the table |
---|
0:09:30 | it is that |
---|
0:09:30 | that would be maxent |
---|
0:09:34 | you can |
---|
0:09:35 | something that we on the table and this happens when people who are usually and |
---|
0:09:41 | the main |
---|
0:09:42 | one worked out of it can even make one or are often and all before |
---|
0:09:48 | the user can be added there is a get because a status but no will |
---|
0:09:53 | not come up before right |
---|
0:09:57 | but this is just to illustrate the sort of affection from |
---|
0:10:01 | at this time ever saw can result in our vocabulary word |
---|
0:10:07 | and of course again if fusion errors |
---|
0:10:10 | the |
---|
0:10:11 | make the situation even |
---|
0:10:14 | so what we want to do |
---|
0:10:18 | we have no framework for spoken language understanding in this phenomena |
---|
0:10:26 | hey |
---|
0:10:27 | this is the store in we aim to handle the picture will or |
---|
0:10:33 | g is the average since upon this is due to the left of the table |
---|
0:10:38 | then we have that are also we have side scott are an example of what |
---|
0:10:43 | little |
---|
0:10:44 | and then it very precise description prepositional phrase |
---|
0:10:52 | so what we want to talk about |
---|
0:10:55 | and a few slides and one of about this interpretation process each of you know |
---|
0:11:02 | and then i believe that our approach |
---|
0:11:07 | then we describe |
---|
0:11:09 | the results were right now response generation can have a chart |
---|
0:11:17 | so this is the set of problems small |
---|
0:11:20 | you to anybody of the speech recognizer |
---|
0:11:24 | then some syntactic analyses in |
---|
0:11:27 | then you may going to show my or my |
---|
0:11:32 | so |
---|
0:11:34 | the speech way speech recognizers such as we will now |
---|
0:11:38 | in my o of such errors |
---|
0:11:42 | these ones |
---|
0:11:44 | you can always speech recognizers are really bad mode |
---|
0:11:49 | it |
---|
0:11:51 | after the syntactic and i is the |
---|
0:11:54 | but also lengthening and live apart |
---|
0:11:59 | to produce |
---|
0:12:00 | but |
---|
0:12:02 | and then you one semantics and but i |
---|
0:12:06 | so if you do we in two stages of semantic interpretation for the robot |
---|
0:12:16 | what i e |
---|
0:12:19 | again every that about on the table again |
---|
0:12:23 | doors the mappings are here the relation my and or |
---|
0:12:29 | and that's prepended is wider rc |
---|
0:12:32 | and we have label a cop not they're not in the table shows for this |
---|
0:12:38 | particular scene |
---|
0:12:40 | there are not be |
---|
0:12:43 | i didn't you all table one |
---|
0:12:47 | so this is an interpretation that is grounded e how we have |
---|
0:12:53 | so what if we |
---|
0:12:58 | so |
---|
0:12:59 | the first we consider this model that i just described |
---|
0:13:08 | well |
---|
0:13:11 | okay |
---|
0:13:12 | like the standard role in |
---|
0:13:15 | we found was insufficient |
---|
0:13:18 | so we will consider alternate interpretation |
---|
0:13:23 | why everyone provide a system for five in a just one used to be the |
---|
0:13:28 | base |
---|
0:13:30 | so the little amount stage process where stages of my has not the patient |
---|
0:13:37 | the addressee we don't want to start local maxima might not be what appears to |
---|
0:13:43 | be a based interface |
---|
0:13:45 | so we have a stochastic optimization process where we provide security different stages |
---|
0:13:53 | okay we want to right |
---|
0:13:56 | the different interpretations so we need somebody ways to make their problem |
---|
0:14:01 | at me about being used only the recognition is speakers the |
---|
0:14:08 | so this is illustrated our approach |
---|
0:14:13 | the first thing we do you and you like this waterfall roles what we call |
---|
0:14:18 | we so we have some of the presentation |
---|
0:14:23 | and then we |
---|
0:14:26 | products i we i |
---|
0:14:29 | we don't they should also try |
---|
0:14:32 | we different stages probabilistically in we can continue and you see |
---|
0:14:42 | it's not null and of my there |
---|
0:14:45 | that is one and one |
---|
0:14:49 | so i don't completion officer and i |
---|
0:14:53 | we assert that looks like |
---|
0:14:59 | now we one o is estimated probably these |
---|
0:15:04 | all their relations |
---|
0:15:10 | and |
---|
0:15:12 | may just apply bayes rule |
---|
0:15:15 | sure if you basically with a given set my impression that this implies that all |
---|
0:15:23 | day |
---|
0:15:24 | no context can be anything i story |
---|
0:15:28 | and i don't history i mean at the moment is the rule more data |
---|
0:15:35 | and |
---|
0:15:36 | we need like to ask for my i don't know so i want to make |
---|
0:15:40 | more complicated but |
---|
0:15:42 | imagine that are |
---|
0:15:44 | think that problem is formulated from i know |
---|
0:15:48 | so all then it is worth this problem |
---|
0:15:54 | the first one directly from the speech recognizer scores we use probabilities lose your number |
---|
0:16:01 | between zero and one |
---|
0:16:05 | parser generates parsers are real users probably e |
---|
0:16:10 | here |
---|
0:16:11 | we favour or simple interpretation sell the urinal the better |
---|
0:16:17 | and |
---|
0:16:19 | this is the more there are what we get the problem |
---|
0:16:26 | so let's illustrate this so what we have this argument of j o |
---|
0:16:33 | this is a crime and what we want you know that |
---|
0:16:37 | is how well each of the prime i really am i and my |
---|
0:16:41 | the corresponding to my |
---|
0:16:44 | so in the first one |
---|
0:16:46 | we have a problem |
---|
0:16:49 | that it |
---|
0:16:50 | you will designate got three by that are not by the colour blue |
---|
0:16:56 | then it is |
---|
0:16:58 | well that's that relation location or could be designated by |
---|
0:17:02 | the provisional |
---|
0:17:04 | and whether or not goal table one |
---|
0:17:06 | that |
---|
0:17:09 | in addition |
---|
0:17:12 | one who assigned a probably be i mean on the well |
---|
0:17:18 | wow so we can see the models can you on the world and everybody these |
---|
0:17:24 | buttons them on kind of work |
---|
0:17:28 | over the table to be than the problem is |
---|
0:17:34 | shell |
---|
0:17:35 | i just a continuation of the problem but you make some simplifying assumption |
---|
0:17:41 | so |
---|
0:17:42 | the remote will eat corpus to the user and able to refer to |
---|
0:17:47 | it does and of are more or fess okay why this or something that all |
---|
0:17:52 | and it really ambitious |
---|
0:17:55 | and he thought would have a robot and the mobile |
---|
0:17:58 | be able to walk around the room and both |
---|
0:18:02 | and we won one whole the role of all you see a actions that the |
---|
0:18:07 | we you of the time i want to get a better |
---|
0:18:12 | so that's why we make this assumption |
---|
0:18:16 | in addition each object is |
---|
0:18:19 | in a more label |
---|
0:18:22 | and then his sound |
---|
0:18:24 | the next life and deletions will assume that each object region |
---|
0:18:30 | so it may be circumscribed by a block each object is a single and |
---|
0:18:36 | but we have another and that's no way to from the speakers in y because |
---|
0:18:43 | if an object is able to it |
---|
0:18:47 | the problem the speaker is referred to we explore the and or not |
---|
0:18:54 | so we calculate is probably e |
---|
0:18:58 | so this is all technology channel we got a doing the learning |
---|
0:19:05 | will improve |
---|
0:19:07 | so you |
---|
0:19:10 | the lexicon new data was calculated using wordnet similarity function |
---|
0:19:18 | that are similar to what is calculated using a particular function you one i |
---|
0:19:26 | and |
---|
0:19:27 | exactly about ten percent you system or changing current system origins in |
---|
0:19:34 | similar |
---|
0:19:37 | so long as you probably you know what it was reported e |
---|
0:19:43 | how similar to you |
---|
0:19:45 | the |
---|
0:19:47 | but we are |
---|
0:19:50 | in this i |
---|
0:19:52 | we probably you got me |
---|
0:19:57 | dean you know |
---|
0:20:00 | and this was only by comparing the exercise for the bottom row |
---|
0:20:06 | we |
---|
0:20:07 | this is all |
---|
0:20:09 | a be consider the |
---|
0:20:13 | and if you're curious we used at a constant |
---|
0:20:18 | so |
---|
0:20:22 | we have a topological relations |
---|
0:20:25 | so the most interest while he's |
---|
0:20:30 | where we have a function that what the is nice |
---|
0:20:35 | represent we should for large |
---|
0:20:40 | i hope to continue for another way that's order to be in near each other |
---|
0:20:49 | so we have right |
---|
0:20:53 | i'm not sure |
---|
0:20:56 | that is done anything that they lack the thing like that and between the flower |
---|
0:21:01 | the baseline |
---|
0:21:03 | but what i say that these were in here |
---|
0:21:05 | these two are not |
---|
0:21:09 | so our function reflects this intuition |
---|
0:21:13 | and finally relations between your sentence frame of reference |
---|
0:21:19 | which means that |
---|
0:21:21 | you know there may be also |
---|
0:21:24 | we adopted it will be adopted the point of view that we are able he |
---|
0:21:29 | where interview speaker |
---|
0:21:32 | so this is the plan that means the right okay or speak |
---|
0:21:40 | so |
---|
0:21:41 | these where |
---|
0:21:43 | this is a short overview of what i |
---|
0:21:46 | so what can i don't think so far what we know |
---|
0:21:50 | so this is the case where we have audience participation |
---|
0:21:54 | so i'll |
---|
0:21:56 | therefore it play a little the microwave |
---|
0:21:58 | which one |
---|
0:22:06 | the |
---|
0:22:07 | you can sample |
---|
0:22:10 | the time course |
---|
0:22:12 | need only my yes can second guess we here |
---|
0:22:18 | but none of the missile |
---|
0:22:23 | okay about the case |
---|
0:22:33 | but |
---|
0:22:35 | the one okay again |
---|
0:22:38 | i mean in do you have three factors system |
---|
0:22:42 | that is |
---|
0:22:45 | now i really |
---|
0:22:47 | the label y is what we are some participants describe |
---|
0:22:51 | in this every the screen so what the intended it is actually one it is |
---|
0:22:57 | easy well i |
---|
0:23:02 | okay |
---|
0:23:03 | i want to find humour |
---|
0:23:06 | so well this is |
---|
0:23:09 | so the okay a |
---|
0:23:12 | this project is a few years all |
---|
0:23:14 | so i |
---|
0:23:15 | our speech recognizer was really giving us a lot of all |
---|
0:23:21 | we were using the microsoft the u i it before deep learning |
---|
0:23:27 | so what we decided we have some e |
---|
0:23:31 | about it and e so all error correction for the speech recognizer |
---|
0:23:38 | so what we need |
---|
0:23:40 | each we had some steps |
---|
0:23:43 | it is more like of course incorporated into are lower |
---|
0:23:49 | so we had to record speech recognition errors one but i think error correction |
---|
0:23:56 | it was a preprocessing step and robot error correction the possible across the things |
---|
0:24:01 | and yes |
---|
0:24:03 | now that you have been speech recognizer the impact of this it is floor |
---|
0:24:09 | but especially what |
---|
0:24:11 | marian discussed yesterday maybe kind of thing hand |
---|
0:24:16 | so that the semantic error correction |
---|
0:24:20 | in this was like every year |
---|
0:24:23 | we propose gently words ripley's or words that have expect i'm expect the boxes |
---|
0:24:32 | so you are described in all you get the bar in |
---|
0:24:37 | that can expect |
---|
0:24:38 | so use a generic were replayed |
---|
0:24:41 | however more than we replace the |
---|
0:24:45 | all of the problem you the new word i in a remote location so probably |
---|
0:24:53 | be a really planet |
---|
0:24:57 | the probability of those on a five of the problem you do not ever so |
---|
0:25:02 | we don't around just replacing work we don't lie you have to read to make |
---|
0:25:07 | a replace |
---|
0:25:09 | so this is the right for example here |
---|
0:25:12 | this is really a |
---|
0:25:14 | we will light on the back wall |
---|
0:25:18 | then we guess what the person actually |
---|
0:25:25 | but |
---|
0:25:28 | that's what they meant |
---|
0:25:30 | but that's what we're to build played the bus stop right interpretation |
---|
0:25:36 | so well |
---|
0:25:37 | if we |
---|
0:25:39 | me |
---|
0:25:41 | i five times in the end of that side of their own set |
---|
0:25:45 | so all |
---|
0:25:46 | we replace you that i don't think that this is really okay |
---|
0:25:52 | but you only have a few scenes on the cable |
---|
0:25:55 | it's better |
---|
0:25:57 | then |
---|
0:26:00 | okay so no |
---|
0:26:03 | this is what we start right now we have all these i |
---|
0:26:08 | in america okay i and say |
---|
0:26:12 | from one can i |
---|
0:26:14 | which one that models like late |
---|
0:26:18 | but only from this guy gonna different places |
---|
0:26:22 | so no okay |
---|
0:26:24 | it's play invented for their instead of everything that |
---|
0:26:30 | so |
---|
0:26:31 | i |
---|
0:26:35 | so that's what we've done |
---|
0:26:39 | and because one of my favourite sergeant's and she's performance me |
---|
0:26:44 | so first describe the corpus |
---|
0:26:50 | twenty six point six r d c back |
---|
0:26:53 | a native english speakers counter and it is but i will resonate adopted for images |
---|
0:27:01 | in we had a hundred and forty one descriptions |
---|
0:27:06 | no this is the asr performance |
---|
0:27:09 | and you would be split into a similar experiment we will a |
---|
0:27:14 | so you see they difference in what we head |
---|
0:27:20 | there but it hears signal |
---|
0:27:21 | and we will now |
---|
0:27:23 | so we're the word error rate all thirty percent okay |
---|
0:27:29 | in mind that this is an older version of the microsoft speech api |
---|
0:27:34 | and the only fourteen percent for the asr interpretations of the top around one for |
---|
0:27:40 | all right |
---|
0:27:41 | what is what will now where the rate of the top ranked interpretation thirteen and |
---|
0:27:46 | a |
---|
0:27:48 | but |
---|
0:27:49 | still a real |
---|
0:27:54 | so the resulting images that we shall i participants |
---|
0:28:00 | and some location for designed for example in this one |
---|
0:28:05 | each requires that all here it should i don't know that have anything it is |
---|
0:28:09 | there |
---|
0:28:10 | so we believe it uses and parts of speech |
---|
0:28:15 | in this work but we have seen as |
---|
0:28:19 | so okay |
---|
0:28:21 | we got the image and call it and we want and |
---|
0:28:26 | car |
---|
0:28:27 | as well as positions |
---|
0:28:31 | this one particular |
---|
0:28:33 | because they can use color size is it or |
---|
0:28:37 | basically you before loading a project you've relations |
---|
0:28:44 | and then just like real is i |
---|
0:28:47 | what |
---|
0:28:50 | where they had to describe the |
---|
0:28:55 | so no |
---|
0:28:57 | just some characterization of what people the |
---|
0:29:00 | in terms of known it |
---|
0:29:02 | there you know that were somewhere out of vocabulary |
---|
0:29:07 | so not just speech recognition error but words like that you words like model with |
---|
0:29:12 | the |
---|
0:29:14 | and they're gonna do not and then you will see |
---|
0:29:18 | we may |
---|
0:29:20 | is there |
---|
0:29:21 | we distinguish two types of one |
---|
0:29:25 | why are descriptions |
---|
0:29:27 | max at least one interpretation in every respect |
---|
0:29:32 | any perfect descriptions means max k |
---|
0:29:36 | so for a in prior description they come from multiple interpret it |
---|
0:29:46 | so these tasks i for our core well about three or four or eight |
---|
0:29:52 | and then apply that wordperfect in there was only one possible right side and that |
---|
0:29:57 | makes sense |
---|
0:29:58 | then sixty percent without which means that we're several reference mask perfectly |
---|
0:30:06 | and then we had to kind of thing accuracy |
---|
0:30:09 | and where only one object matches ending perfect remote one will do not depend |
---|
0:30:20 | no performance matrix |
---|
0:30:24 | again i'm going back to the ideal result how we wanna make explore the interpretation |
---|
0:30:32 | he's reasonable so yes but gold standard annotation |
---|
0:30:38 | by we my |
---|
0:30:39 | a perfect match |
---|
0:30:41 | like |
---|
0:30:43 | contrary to what |
---|
0:30:46 | this is a popular nowadays the screen |
---|
0:30:51 | not address yesterday you say okay i is all words in the list x and |
---|
0:30:57 | y |
---|
0:30:58 | sorry the object but the wall |
---|
0:31:01 | i don't care much percent of the request just retrieve the roles e |
---|
0:31:07 | so a perfect match not present such as |
---|
0:31:10 | it's a severe heart because at the end of the day |
---|
0:31:14 | you want all you know |
---|
0:31:17 | if you wanted and role |
---|
0:31:20 | so little or no but anyways for everything you want to understand perfect what |
---|
0:31:26 | well |
---|
0:31:29 | in addition |
---|
0:31:31 | we want to know if we probably their projects like you will see what problem |
---|
0:31:39 | if you use a live recording okay |
---|
0:31:43 | right of the roll can be a really no particular range |
---|
0:31:51 | she'll |
---|
0:31:52 | the roundness constantly as one unit profile of our systems that well |
---|
0:31:58 | so what you |
---|
0:32:01 | we have the right |
---|
0:32:02 | a two |
---|
0:32:03 | and |
---|
0:32:05 | and we have the probably the deceased in this kind of the this at the |
---|
0:32:09 | top right of the replace your |
---|
0:32:12 | matches |
---|
0:32:13 | the user's intention so this would be |
---|
0:32:16 | all day however the bottom right meaning it's wrong |
---|
0:32:21 | so it in this killer graph |
---|
0:32:24 | they refer the reader is referred by the system |
---|
0:32:28 | if at all |
---|
0:32:30 | and then we have a second one is the green one and then you have |
---|
0:32:33 | more probable one |
---|
0:32:35 | which one |
---|
0:32:36 | is small |
---|
0:32:38 | so for this for probable one |
---|
0:32:42 | you mean one and everybody three quarters of the brown |
---|
0:32:48 | not give a great |
---|
0:32:53 | so all our main breaks |
---|
0:32:57 | are three core which is actually recall |
---|
0:33:00 | where we is not always fractional round balls location |
---|
0:33:05 | to do it would probably interpretations |
---|
0:33:08 | and in c g which was defined by automating can get i don't |
---|
0:33:13 | a in the |
---|
0:33:16 | why does what side of the fraction that are reward |
---|
0:33:21 | you'd also or a discount lower right |
---|
0:33:27 | it right stand recognition does not have lower right but dct a |
---|
0:33:33 | the normalization component that i |
---|
0:33:38 | you divide whatever this is thinking about we'd like here |
---|
0:33:42 | by this score of an option |
---|
0:33:45 | where you're based on the beam was not the goal i think the situation where |
---|
0:33:50 | you are more advanced up right one |
---|
0:33:54 | so |
---|
0:33:55 | you by like the score of the option and then you |
---|
0:34:02 | so how do |
---|
0:34:05 | and we did okay that's the short version but i |
---|
0:34:12 | syllable is not actually |
---|
0:34:16 | it's not like that or |
---|
0:34:18 | that in our money left labelled c is |
---|
0:34:23 | that's |
---|
0:34:23 | better than that will allow okay |
---|
0:34:28 | if we use their predictions that's not very interesting about that for all i k |
---|
0:34:34 | so we might better now there is a reasonable that we have more than three |
---|
0:34:41 | and e c g is not into one |
---|
0:34:45 | by one or two but with a prayer |
---|
0:34:49 | but this surprising is a use rc replacement but in a war |
---|
0:34:57 | that |
---|
0:34:58 | but it would be why the problem replacement pretty or does not |
---|
0:35:05 | that's certainly not second guessing |
---|
0:35:09 | so that a surprise |
---|
0:35:16 | okay |
---|
0:35:18 | let's go on to response generation |
---|
0:35:22 | this is more control |
---|
0:35:27 | a popular problem yees select part in particular that features such as a as a |
---|
0:35:35 | side so that okay |
---|
0:35:38 | for the current approach is used on the fact |
---|
0:35:42 | there is only one acceptable |
---|
0:35:46 | but the main more than one |
---|
0:35:47 | maybe we will and stuff |
---|
0:35:51 | so the goal of this last part of the result was first of all learn |
---|
0:35:56 | what context of a response to |
---|
0:35:59 | the weather instead we rely different schools this |
---|
0:36:05 | and whether we |
---|
0:36:07 | distinguish between what did you in but like our two |
---|
0:36:13 | i think we all on the reason like a microwave |
---|
0:36:17 | but you want your what you agree in that my there but we |
---|
0:36:23 | not sure maybe you want you're able to be more sources than you |
---|
0:36:29 | so the design of y |
---|
0:36:32 | we compare the refer to convert a relations in two ways |
---|
0:36:37 | so |
---|
0:36:38 | you just added over from |
---|
0:36:42 | we assume the ones that are based on the i it |
---|
0:36:48 | we have all been we want you did they are able but that's the robot |
---|
0:36:53 | can find at the end of that |
---|
0:36:57 | we consider for response i |
---|
0:37:00 | which means just what to do so on |
---|
0:37:05 | a tool which means a |
---|
0:37:08 | it is eager wire between v two or three k entries phrase by phrase level |
---|
0:37:16 | of a whole |
---|
0:37:18 | don't be a different way |
---|
0:37:21 | and we can see what we have conducted one experiment anywhere in the process of |
---|
0:37:26 | combat and the second experiment |
---|
0:37:30 | so far in the first experiment we got artist incorrect responses |
---|
0:37:36 | a silence of what's |
---|
0:37:40 | so well i guess i want to solve a |
---|
0:37:44 | because there are the asr |
---|
0:37:48 | we one relay |
---|
0:37:51 | well of the asr be |
---|
0:37:54 | people can guess really where would you are |
---|
0:37:59 | and i known |
---|
0:38:03 | we train the classifier to produce acceptable responses |
---|
0:38:10 | and okay you use a score |
---|
0:38:13 | you're the first experiment using a |
---|
0:38:17 | so all we |
---|
0:38:18 | thirty five participants some of which were still from the one experiment |
---|
0:38:23 | describe the same okay |
---|
0:38:26 | we got |
---|
0:38:28 | that and seventy five descriptions in to draw a little right |
---|
0:38:34 | so you see when it is likely by not nsu and well |
---|
0:38:43 | asr performance is all the previous slide |
---|
0:38:48 | word error rate was only thirteen percent and by |
---|
0:38:52 | jointly of the requested object at least are also asr errors in indulging section driver |
---|
0:39:00 | the landmark search |
---|
0:39:04 | so you have something that will enhance the back the |
---|
0:39:10 | the correct ones and also interesting |
---|
0:39:15 | and you can guess can you guess what people say |
---|
0:39:28 | yes |
---|
0:39:31 | and |
---|
0:39:36 | like |
---|
0:39:38 | larger |
---|
0:39:46 | okay then we got it |
---|
0:39:49 | a simple or false |
---|
0:39:51 | where p c where a |
---|
0:39:54 | how this all for a so i |
---|
0:39:59 | for someone else's lazily or max a |
---|
0:40:05 | based solely on l two |
---|
0:40:08 | the dialogue policy and the results |
---|
0:40:14 | and for this experiment with four participants again |
---|
0:40:18 | both with |
---|
0:40:20 | so this is still in the participants were show |
---|
0:40:25 | and or something but not all the objects on |
---|
0:40:30 | and that was all again mentioned about five |
---|
0:40:35 | for us |
---|
0:40:37 | and |
---|
0:40:41 | you can see that they're talking about |
---|
0:40:45 | yes |
---|
0:40:47 | yes |
---|
0:40:49 | it |
---|
0:40:51 | in this and then that would be used in |
---|
0:40:54 | four options to a value that is that for the purposes of this presentation participants |
---|
0:41:01 | were not so that it is that there were a total of four intraframe |
---|
0:41:07 | but what is it but it's a huge rooms one score |
---|
0:41:11 | and then |
---|
0:41:12 | for the first response is a number |
---|
0:41:17 | from |
---|
0:41:19 | so if you are going to fix |
---|
0:41:21 | the request at all |
---|
0:41:23 | in which all |
---|
0:41:30 | so |
---|
0:41:31 | we don't sell all and |
---|
0:41:34 | we train some classifiers |
---|
0:41:36 | we the trained and you are able just database and two side guy |
---|
0:41:43 | it side |
---|
0:41:45 | indeed it is not bad because there wasn't enough |
---|
0:41:51 | so |
---|
0:41:54 | influential features where they can see that the third problem efficiently |
---|
0:42:02 | if you know that the performance you have one about nine percent of your updated |
---|
0:42:09 | is okay |
---|
0:42:12 | so the eventual users use percent of |
---|
0:42:16 | wrong words in the asr how do we know words are we have a classifier |
---|
0:42:22 | that |
---|
0:42:23 | that's which works well |
---|
0:42:28 | and you will be sold disease |
---|
0:42:30 | not all right their predictions that this you are scored |
---|
0:42:35 | so it would someday |
---|
0:42:37 | i se |
---|
0:42:39 | score all |
---|
0:42:41 | locations already i meaning the task force between requires in all day |
---|
0:42:48 | and in the u number of out-of-vocabulary words |
---|
0:42:54 | so this is also |
---|
0:42:57 | what we consider all the board but |
---|
0:43:01 | to is dangerous here january english native new this |
---|
0:43:07 | where |
---|
0:43:08 | useful and recall and f-score of seventy four |
---|
0:43:12 | so we were coming from |
---|
0:43:14 | what the participants were common |
---|
0:43:16 | this is |
---|
0:43:17 | then |
---|
0:43:19 | see here i all the data |
---|
0:43:23 | and |
---|
0:43:24 | we got |
---|
0:43:26 | and the score of nine two |
---|
0:43:28 | so that can be something here with the system so this is not fair |
---|
0:43:35 | but i is the |
---|
0:43:38 | or what you from his this is user rate and preferences are the big |
---|
0:43:46 | so what is the main inside yes people based on the differently in fact |
---|
0:43:53 | this is an extreme example because if you know more participants in this experiment in |
---|
0:43:59 | the previous call also we had used very able to work on the exact same |
---|
0:44:05 | so |
---|
0:44:07 | again |
---|
0:44:12 | any other |
---|
0:44:17 | yes i placed on the right of the right |
---|
0:44:21 | and |
---|
0:44:25 | this is |
---|
0:44:26 | what are participants |
---|
0:44:29 | we saw what parts and say okay the ones that come from |
---|
0:44:35 | one possible scores phrase |
---|
0:44:37 | no you have the sack part |
---|
0:44:40 | this is what the user was described |
---|
0:44:46 | so |
---|
0:44:47 | being courses is not about |
---|
0:44:51 | okay so i o k v c r challenge is |
---|
0:44:57 | is the bottom right |
---|
0:45:01 | so we need to do |
---|
0:45:03 | first of all we need to deal with real c |
---|
0:45:06 | our case where there were constructed using all three tool |
---|
0:45:14 | it sounds great but their sin |
---|
0:45:17 | and eighty somewhere so at least i hereby are |
---|
0:45:23 | i can be this work but we re scenes but that were causing some problem |
---|
0:45:28 | got its own problems |
---|
0:45:30 | because it can be very frustrating that kind of all |
---|
0:45:36 | car is being |
---|
0:45:40 | so that are and so that an |
---|
0:45:42 | have a paper addresses some of the other problem |
---|
0:45:47 | then that i |
---|
0:45:49 | and i like one of the texture |
---|
0:45:54 | she |
---|
0:45:59 | that's |
---|
0:45:59 | about okay so |
---|
0:46:02 | frames of reference |
---|
0:46:05 | there are lots of frames of reference speaker oriented here or the absolute |
---|
0:46:11 | in c |
---|
0:46:12 | but in the basic frame of reference in the fate |
---|
0:46:16 | the front of your lips easily the front of your data doesn't matter course there |
---|
0:46:23 | so |
---|
0:46:24 | also you can be all frames of reference s b one seen that and incorporated |
---|
0:46:30 | into interpretation |
---|
0:46:34 | and context positional relation |
---|
0:46:37 | the left of the front of the table doesn't something that somebody is |
---|
0:46:44 | linguistic phenomena hold it is the white or by nicole all the weak lexical stimuli |
---|
0:46:52 | yes what's a presentation about out of vocabulary words |
---|
0:46:59 | and more work has to be done about inaccuracy in u e |
---|
0:47:05 | perceptual i a busy |
---|
0:47:09 | yes asr grammar scale a problem in something better problems in |
---|
0:47:14 | v error or |
---|
0:47:18 | i don't know in this is but you know that |
---|
0:47:22 | is still not there's no |
---|
0:47:27 | user adaptation |
---|
0:47:30 | which all the different people to use right reference s |
---|
0:47:34 | and this adaptation is to be |
---|
0:47:38 | but what are trying to understand what people say |
---|
0:47:42 | in this case and the way people the or there are so a sign a |
---|
0:47:46 | nation |
---|
0:47:47 | it also response generation |
---|
0:47:50 | before this is why in different ways |
---|
0:47:54 | some people prefer the system should be able just seeing something record |
---|
0:48:04 | we need to integrate all i and |
---|
0:48:08 | i the overall view all the interpretation rules to not |
---|
0:48:14 | if you while seeing |
---|
0:48:18 | we know how are preferred interpretation right context of other of c e |
---|
0:48:26 | evaluation we need a system is reasonable |
---|
0:48:32 | and |
---|
0:48:33 | what i |
---|
0:48:36 | because lack of trust |
---|
0:48:38 | these you |
---|
0:48:41 | we perform human evaluations yes we don't like a mass |
---|
0:48:48 | and |
---|
0:48:50 | we must do not based once the result here |
---|
0:48:55 | so we need to be quite different interpretations are closing the need to you swatting |
---|
0:49:01 | italians in different interpretations on can ask |
---|
0:49:05 | appropriate questions |
---|
0:49:07 | and is used in this i will tell when it does not know wow |
---|
0:49:13 | in a just |
---|
0:49:14 | you see response |
---|
0:49:17 | so |
---|
0:49:18 | that's about i'm thinking all the people |
---|
0:49:23 | ever worked on this problem |
---|
0:49:28 | and then you |
---|
0:50:13 | with |
---|
0:50:18 | i'm going to disappoint either |
---|
0:50:21 | just looking around |
---|
0:50:25 | there was no okay so you just look around |
---|
0:50:29 | what meanwhile |
---|
0:50:30 | but it is very minimal |
---|
0:50:33 | we want it then all singing all bands and rowboat that |
---|
0:50:37 | so we also there are |
---|
0:50:40 | and we had these make them where we would match access to say for example |
---|
0:50:46 | where |
---|
0:50:47 | you can extra exam seldom in a |
---|
0:50:52 | right of the ball or i might i heard correctly |
---|
0:50:57 | so what but that one by the board because |
---|
0:51:01 | reality check and we start the referring expressions are mainly |
---|
0:51:05 | looking for things around a |
---|
0:51:09 | i |
---|
0:51:10 | okay a |
---|
0:51:12 | the standard names of the |
---|
0:51:14 | what you are and you would the |
---|
0:51:17 | goal for |
---|
0:51:19 | rock and category is if you one but we were very low just to name |
---|
0:51:25 | and then one side of the wordnet for see now i might |
---|
0:51:30 | but that was the idea of done |
---|
0:51:35 | there's a turn |
---|
0:52:08 | right one |
---|
0:52:11 | i'm like i'm not in the kitchen |
---|
0:52:15 | why don't like okay |
---|
0:52:18 | and if i didn't or anything like a and by or not |
---|
0:52:23 | like |
---|
0:52:24 | there i think my house and i one and then use them |
---|
0:52:29 | so yes i mean it's |
---|
0:52:31 | you would contextual i |
---|
0:52:34 | but |
---|
0:52:36 | what if i want the flow and identically |
---|
0:52:44 | exactly but i would be one i mean one of the sound while we are |
---|
0:52:48 | appropriate |
---|
0:52:50 | what we're not appropriate |
---|
0:52:53 | so |
---|
0:52:54 | where context and i mean exactly what |
---|
0:52:57 | we will now that |
---|
0:52:59 | however in this case it you are |
---|
0:53:02 | model |
---|
0:53:03 | on work like |
---|
0:53:05 | i was actually haven't all possible problem i was saying is that star flower like |
---|
0:53:10 | flower |
---|
0:53:12 | or |
---|
0:53:14 | our car phone or things like i a lot of normally i want to anything |
---|
0:53:21 | other than flower |
---|
0:53:23 | so there is |
---|
0:53:25 | in my contextual i think that kind of like second guessing the person towards the |
---|
0:53:29 | call |
---|
0:53:32 | the commentary |
---|
0:53:36 | i mentioned it is something that training with how much context relative scale |
---|
0:53:55 | for i mean we can prove our |
---|
0:54:00 | a slider direction problem h |
---|
0:54:03 | hasn't been the used by lee |
---|
0:54:06 | at the moment thing to get this unit but that are instantly |
---|
0:55:04 | well at some point of that is |
---|
0:55:07 | long or you have phone |
---|
0:55:12 | and |
---|
0:55:14 | i mean that we know why people thinking |
---|
0:55:18 | only when they were not restrict that would just a |
---|
0:55:22 | whatever why the point that about twenty percent of the time or |
---|
0:55:27 | there are |
---|
0:55:29 | so that we are going to be point |
---|
0:55:33 | they tend to become more me now we can get that |
---|
0:55:41 | but definitely i mean whatever right okay |
---|
0:55:48 | that's |
---|
0:55:49 | why didn't yourself |
---|
0:55:51 | and that goes to the definition part in fact the there was a paper yesterday |
---|
0:55:57 | but |
---|
0:55:58 | an hour |
---|
0:56:02 | the ones |
---|
0:56:04 | kind of limited in the interpretation for by already spoken the colour |
---|
0:56:10 | and then we using your also |
---|
0:56:15 | is that |
---|
0:56:18 | five around one |
---|
0:56:48 | the that if you need for every |
---|
0:57:02 | about that a but there are a |
---|
0:57:05 | so i have to do this in the problem doesn't performance for |
---|
0:57:11 | so that doesn't surprise me some point i mean |
---|
0:57:15 | but maybe we should |
---|
0:57:17 | the fees we are now whether we have several problems a minus right and i'll |
---|
0:57:22 | and their be assigned to me |
---|
0:57:26 | so how much for down within therefore it is |
---|
0:57:30 | exactly |
---|
0:57:34 | it's all could have an and |
---|
0:57:38 | that there probably but when we saw those with a in can see that the |
---|
0:57:44 | the main aim at ever or is in your in great deal in you don't |
---|
0:57:50 | get much mileage out three |
---|
0:57:55 | they |
---|
0:57:56 | i think |
---|
0:57:57 | you are looking at the fourth basically |
---|
0:58:02 | it was somebody |
---|
0:59:24 | okay like the first five better |
---|
0:59:26 | because we try the that the dean at the beginning very ambitious constraints on the |
---|
0:59:35 | object of the accent so we had the and |
---|
0:59:40 | well we had a |
---|
0:59:43 | actually or the actions for a particular case the what i think each other |
---|
0:59:50 | and all that weighted by the board when we had every and six of the |
---|
0:59:54 | i-th class |
---|
0:59:55 | in some but once |
---|
0:59:58 | yes definitely the four |
---|
1:00:00 | one of ten |
---|
1:00:02 | vol |
---|
1:00:04 | and |
---|
1:00:04 | and likewise if you have particular we are not sure whether they're the syllable or |
---|
1:00:11 | goal |
---|
1:00:13 | then |
---|
1:00:14 | you will go back and constraint of our off |
---|
1:00:19 | but as i said we had to know where r |
---|
1:00:24 | and okay what the user is embedded in the very large one the |
---|
1:00:46 | i don't know what to say to make a |
---|
1:01:06 | what |
---|
1:01:10 | well but the way we can design cation that we listened my only there |
---|
1:01:16 | so estimate only relative the thing mean segmentation for |
---|
1:01:20 | and it was incorrect hundred percent of the anybody problems with the problem and better |
---|
1:01:26 | than that of |
---|
1:01:27 | right |
---|
1:01:30 | so the only thing a lot of there is if you live semantic role labeling |
---|
1:01:35 | and you and that the thing that only or did |
---|
1:01:37 | you really don't they can be more |
---|
1:01:41 | this is what the you know |
---|
1:01:44 | if there is still are a bit not like war |
---|
1:01:50 | band |
---|
1:01:52 | you know that c |
---|
1:01:53 | if you |
---|
1:01:54 | at some point get to know that you don't know |
---|
1:01:59 | the things that the |
---|
1:02:58 | well as the semantic in our case the semantic role labeling there was trained on |
---|
1:03:03 | a referring expression with the various don't expect even when it's all of our paper |
---|
1:03:10 | segment mostly in the right place but you have a |
---|
1:03:15 | very briefly that saying it's and the expectations would be much better |
---|
1:03:21 | i cannot |
---|
1:03:22 | i denote better success there but for referring expressions was quite well |
---|
1:03:45 | you mean just for the five or |
---|
1:03:53 | well for the parse tree we got indicted from what they were trying |
---|
1:03:57 | three |
---|
1:04:04 | it wasn't from portals like to thank you but if one of them somebody sitting |
---|
1:04:09 | or whatever |
---|
1:04:11 | it is reached their maxima this work the lexical my |
---|
1:04:16 | at all of the sixteen year and by |
---|
1:04:19 | no like can go like |
---|
1:04:22 | i plan |
---|
1:04:24 | and that it is are then you get the pay to get like the score |
---|
1:04:29 | of a second we don't like little recall |
---|
1:04:33 | it's time for mapping but you get the very low score for that matter |
---|
1:04:43 | that that's why we don't think that environment and that's why at home or two |
---|
1:04:49 | two we review fire and |
---|
1:05:00 | the slogan of efficiency |
---|
1:05:02 | so |
---|
1:05:04 | you know that a framework |
---|
1:05:07 | okay let's call it could have a coffee breaks into |
---|