0:00:16 | very much for the introduction |
---|
0:00:18 | uh |
---|
0:00:20 | we talk about anomaly detection |
---|
0:00:24 | which is a topic which is being around one time |
---|
0:00:28 | uh the reason why i'm interested in this topic is that the |
---|
0:00:34 | so we have a national |
---|
0:00:37 | project |
---|
0:00:38 | the major object |
---|
0:00:40 | which it is addressing the issues both you based on the computer vision system |
---|
0:00:47 | other |
---|
0:00:51 | the main application slight changes |
---|
0:00:54 | what do you do you have to start from scratch |
---|
0:00:58 | all you have to can you use some of the models and uh one of |
---|
0:01:04 | the issues that one |
---|
0:01:07 | in this context |
---|
0:01:08 | it's uh on the detection because |
---|
0:01:11 | the system has no |
---|
0:01:14 | that if it is fully automatic system |
---|
0:01:16 | because you know that the it cannot cope with |
---|
0:01:21 | the main in both uh that uh because no competence to in that the |
---|
0:01:28 | since the data so that's the context and because it's a reasonably project |
---|
0:01:38 | we in the groove in psychology of the community college london |
---|
0:01:46 | so the plan is the |
---|
0:01:48 | stop |
---|
0:01:50 | the background then we want to on the money detection |
---|
0:01:55 | uh |
---|
0:01:57 | we review all uh right out on anybody detection and that |
---|
0:02:03 | a little bit of it is |
---|
0:02:04 | all |
---|
0:02:08 | approaches |
---|
0:02:10 | and that will then be all position |
---|
0:02:16 | yeah |
---|
0:02:18 | solely on the money detection |
---|
0:02:20 | section system channel |
---|
0:02:23 | and the |
---|
0:02:24 | we apply |
---|
0:02:27 | oh set |
---|
0:02:29 | the problem |
---|
0:02:30 | you know |
---|
0:02:34 | interpretation system |
---|
0:02:37 | so that's plan to |
---|
0:02:39 | so if you |
---|
0:02:43 | this on vision system we present system the difficult to a stage is and the |
---|
0:02:50 | first of all the to the remote modules |
---|
0:02:54 | solving lost six |
---|
0:02:56 | do about the if you are not just like to do not basically problem but |
---|
0:03:02 | uh |
---|
0:03:04 | image processing vision i want to see you developing a system that actually application and |
---|
0:03:12 | many other issue |
---|
0:03:14 | think about the channel |
---|
0:03:16 | you need to collect a lot of training data because the existing systems uh i |
---|
0:03:24 | let me |
---|
0:03:25 | observations |
---|
0:03:28 | we do not know what is |
---|
0:03:31 | indicating that |
---|
0:03:33 | and the optimized system so it's like that |
---|
0:03:36 | uh nobles that's the goal go through an image that is why convolving |
---|
0:03:44 | and just uh as an example are we talking about the tennis video analysis |
---|
0:03:53 | you |
---|
0:03:56 | for some |
---|
0:04:00 | school |
---|
0:04:02 | yeah so uh and that was just a very few men version of this is |
---|
0:04:08 | that the linear |
---|
0:04:11 | and uh so it's at a G is to you |
---|
0:04:16 | an application and then |
---|
0:04:19 | all services about i |
---|
0:04:37 | okay so um |
---|
0:04:41 | the conference here is the uh is concerned with advanced concepts and in a way |
---|
0:04:46 | when you develop uh and interpretation system then uh in the sense that system is |
---|
0:04:52 | advanced in its own right so i could be just talking about the video uh |
---|
0:04:56 | the tennis video notation system but then my focus will be more on the second |
---|
0:05:02 | body point |
---|
0:05:04 | as i already mentioned so suppose you want to add up the system to some |
---|
0:05:09 | other domain uh even quite close domain and go see that the applications i will |
---|
0:05:14 | be uh talking about a very simple indeed nevertheless uh raising like you interest in |
---|
0:05:22 | issues and challenges |
---|
0:05:24 | and that if you want to go that if you want to |
---|
0:05:28 | benefit from many years of after and then try to use what you have and |
---|
0:05:35 | to develop a new uh competence you capability than possible you have to |
---|
0:05:42 | identify that you have a problem that you cannot cope with some input and uh |
---|
0:05:47 | then you have to modify the system inappropriate way and there are of course the |
---|
0:05:52 | other communities at all this stuff community support computer vision that whether or not a |
---|
0:05:59 | transfer that i mean and uh so uh |
---|
0:06:06 | will not be addressing those issues but uh at the end that once you have |
---|
0:06:11 | adopted the system and some new application then |
---|
0:06:15 | when i say i'd update |
---|
0:06:17 | i really mean develop new capability then the system needs yeah and not the functionality |
---|
0:06:24 | it needs to know uh can make sure a situation it is operating and that |
---|
0:06:33 | should be able to classify the context and uh in which it operates so that |
---|
0:06:38 | it can automatically select the appropriate uh domain knowledge voice separation so |
---|
0:06:47 | this is the system that we developed so basically it's the can analyze tennis video |
---|
0:06:53 | the way uh |
---|
0:06:56 | that we describe what the system looks like by the in principle |
---|
0:07:02 | the objective is that uh from the video it input completely automatically you are able |
---|
0:07:09 | to interpret what's going on to the point of points awarded avoiding the uh generating |
---|
0:07:16 | school from the process now |
---|
0:07:19 | i'm not talking about the uh style whole uh yeah we develop a system which |
---|
0:07:26 | works that from two D standard the real cost video okay so that makes a |
---|
0:07:33 | problem it would be difficult but anyway so in principle when you break the video |
---|
0:07:38 | into shots you want to know what's happening in short |
---|
0:07:43 | well as or so seconds uh and that there is not only uh who actually |
---|
0:07:49 | means in the running and we should be awarded a point |
---|
0:07:57 | no |
---|
0:07:59 | probably unless you are young and have very good a nice and uh you will |
---|
0:08:03 | not be able to see the detail about the this is just to illustrate the |
---|
0:08:07 | complexity of the system |
---|
0:08:09 | and that it has uh why the few levels of course in so initially the |
---|
0:08:18 | uh video is broken into shorts and then the short each shot this process the |
---|
0:08:24 | separately basically uh and that is the |
---|
0:08:29 | level processing deals with the foreground-background separation |
---|
0:08:34 | then the key components of the content are extracted which is the motion of the |
---|
0:08:40 | ball and the players and the then the system yeah that uh means uh important |
---|
0:08:48 | events |
---|
0:08:50 | and which is uh one important event is when the board changes detection and way |
---|
0:08:56 | it changes direction |
---|
0:08:58 | and then eventually there is some high level interpretation process of these talents so this |
---|
0:09:05 | is a more digestible somebody of the system okay about that basically the ball tracking |
---|
0:09:13 | is the most important you need to know whether code is uh you need to |
---|
0:09:17 | the text is important events and there is a high level interpretation part which is |
---|
0:09:23 | basically hidden markov model based |
---|
0:09:27 | no most of the modules that the system has use context in some way okay |
---|
0:09:36 | so when i talk about context here it's not the context it's not the domain |
---|
0:09:40 | where the system operate but it's the local context which is like the temporal or |
---|
0:09:44 | spatial so when you want to interpret for instance uh what's going on need to |
---|
0:09:50 | know not only whether board is but also whether players are so uh that is |
---|
0:09:54 | the interaction between objects in the video uh so in principle you are interested in |
---|
0:10:01 | integrity in every object in each frame about the neighboring objects have a uh |
---|
0:10:12 | one may also information which is which is very important and you want to use |
---|
0:10:15 | this to uh information jointly uh they provide contextual information and you want to use |
---|
0:10:22 | this information jointly to make interpretation so in principle you have some slow but knowledge |
---|
0:10:27 | domain knowledge which is a quite in some way i the through line in or |
---|
0:10:31 | partly through |
---|
0:10:33 | yeah so you didn't in the prior knowledge in uh and you are then comparing |
---|
0:10:39 | observations ritual model to make interpretation so this is very genetic uh indication that most |
---|
0:10:46 | of the modules are dealing with contextual information many more usability contextual information uh over |
---|
0:10:53 | time okay so that uh about the other modules deal with the spatial contextual information |
---|
0:11:00 | and some of them with both |
---|
0:11:02 | so the first one for instance is a module which is uh separating foreground and |
---|
0:11:10 | from background so you may want to what happened here uh |
---|
0:11:17 | because players disappear but basically it's the module which is below the remote site so |
---|
0:11:22 | you take video frames from a shot and the and relate them to each other |
---|
0:11:30 | and uh basically that allows you to go to was i and anything that's movie |
---|
0:11:35 | that frame is wiped out because it uh not the assistant information and so you |
---|
0:11:42 | have basically a background and then you can use the background to separate the foreground |
---|
0:11:48 | probably so |
---|
0:11:51 | that's one example all the that all this type of functionalities that the modules perform |
---|
0:11:58 | the most important one once you have uh |
---|
0:12:02 | uh the players and the can extractable used to detect the events so you can |
---|
0:12:08 | see that the so it's the ball tracking problem uh process and that each is |
---|
0:12:14 | also detecting when the ball is changing detection and uh you know uh where the |
---|
0:12:21 | code is that has been automatically the big picture it's a fully automatic system we |
---|
0:12:25 | can uh and that you also can detect players and from that you can derive |
---|
0:12:30 | interpretation |
---|
0:12:32 | this is uh |
---|
0:12:35 | so these are the events that we have extracted in time and the |
---|
0:12:41 | the sequence of these events and the position but they happen any action or more |
---|
0:12:46 | advanced a bit plane is a determine what's going on and you have a hidden |
---|
0:12:51 | markov model but it's a lot of the temporal structure in a small gains in |
---|
0:12:57 | general and so the mean pennies uh which allows you to interpret what's going on |
---|
0:13:03 | and you can then decide to who should be awarded to point at the end |
---|
0:13:10 | okay so and this is an example of what the system would produce so he |
---|
0:13:17 | on the left hand side you to actually tell you what's going on was awarded |
---|
0:13:22 | the point at one time at a tool training |
---|
0:13:26 | okay so we as i said you spent three years developing the system and we |
---|
0:13:32 | were just working with one video and it happened to be a video singles |
---|
0:13:36 | and then a somebody else question about what would happen if you actually applied it |
---|
0:13:41 | to doubles and you know so it's very simple the small transition but the nevertheless |
---|
0:13:48 | at uh |
---|
0:13:51 | significant enough transition for the system to fail so uh and uh |
---|
0:13:57 | so that's one thing about the |
---|
0:14:02 | it's not only question all system fail in you also would like to know uh |
---|
0:14:08 | when it fails to white fellows and can use land or something from it |
---|
0:14:12 | anyway so the question is what are the mechanisms that are needed for the system |
---|
0:14:18 | i didn't to realise that it's actually no longer competent to perform a certain functionality |
---|
0:14:25 | and the how can this functionality be extended |
---|
0:14:31 | already mentioned so this is the project the that we have features been sort of |
---|
0:14:36 | a motivating the work in this area and the anyway so already i think alluded |
---|
0:14:43 | to these mechanisms that we need to i don't to take this we need to |
---|
0:14:48 | cross knowledge and the we need the to adapt interpretation processes and acquire new competencies |
---|
0:14:56 | that way |
---|
0:14:57 | okay so |
---|
0:15:02 | these are the mechanism this is done is and to what i'm going to focus |
---|
0:15:06 | on anomaly detection so already talked with twenty minutes and i haven't the restarting the |
---|
0:15:11 | topic of the of the lecture okay so uh these are the mechanism that would |
---|
0:15:16 | be normally needed and that but one of the nice anomaly detection |
---|
0:15:22 | oh if you look at |
---|
0:15:24 | the |
---|
0:15:26 | it well as the definition of on the money to start with and it's a |
---|
0:15:31 | normally understood this um so something deviating from automatically but the that the how the |
---|
0:15:38 | normal it is defined yeah is very general and that can be some sort order |
---|
0:15:44 | it can be sort of a statistical normally you can be a rule whatever so |
---|
0:15:48 | it's uh original there are also many synonyms and the interestingly some of these uh |
---|
0:15:55 | pseudonames the general mean |
---|
0:15:58 | deviation from normality about the sometimes the uh they have some uh additional nuance uh |
---|
0:16:04 | and that they may need for in cincinnati |
---|
0:16:08 | yeah regularity okay innovation so there is a |
---|
0:16:12 | difference between uh and the money and innovation because innovation usually means implies a change |
---|
0:16:21 | is of constant change you moving to some of the uh model of a proxy |
---|
0:16:27 | experience |
---|
0:16:30 | now what is that conventional model i think everybody knows that the menu look what |
---|
0:16:35 | anomalies you are normally thinking in terms of uh outliers of some distribution uh so |
---|
0:16:45 | you have a gaussian for instance and that was the |
---|
0:16:52 | uh making observations away yeah then used several it must be applied must be anomalous |
---|
0:16:58 | observations because it's not pretty consistent with my model of the data the experience the |
---|
0:17:04 | time uh that i make the past so one is a |
---|
0:17:11 | look in uh and basically the mathematical model is a statistical one in principle and |
---|
0:17:17 | the uh |
---|
0:17:23 | sometimes you the not only work with a single observation but the weight the multiple |
---|
0:17:29 | observations and then you may be interested whether uh we distribution of the all observations |
---|
0:17:35 | are different from the distributions of but uh of your model and uh so you |
---|
0:17:41 | could also be talking about the sum so that uh normally in terms of the |
---|
0:17:45 | shape of the distribution |
---|
0:17:49 | as i said to anomaly detection has been of interest for a long time uh |
---|
0:17:54 | domain and value goes back to the nineteenth century a people have been interested in |
---|
0:18:00 | developing normal model so gaussian models and uh for model in various uh sets of |
---|
0:18:08 | data observations and the and how they have been detected by the model is uh |
---|
0:18:16 | when the observation is consistent with that model so over the uh hundred years i |
---|
0:18:22 | suppose most of the work has been focusing on this type of concept of but |
---|
0:18:27 | uh no money and there are excellent surveys which uh make like quite easy and |
---|
0:18:33 | uh recently quite a lot of working in on the money detection comes from the |
---|
0:18:37 | security and the surveillance the communities as they are very much interested in formulating the |
---|
0:18:43 | problem of but uh detecting the something unusual as the and on the water detection |
---|
0:18:49 | problem but that although they may be using quite complex system most of the uh |
---|
0:18:55 | notions of on the money in these the papers are very close to the statistical |
---|
0:19:00 | notion so even if you have a complex just images multiple layers of interpretation very |
---|
0:19:06 | often people still uh loop on the money from these the uh from these models |
---|
0:19:14 | so you can estimate are presented in a very simple way is here so this |
---|
0:19:17 | is your basic system which is performing sometimes you have sense uh you got some |
---|
0:19:22 | usually single hypothesis model |
---|
0:19:26 | uh so i could distribution and the there and uh this derive some action something |
---|
0:19:34 | that something and you are interested to know whether the uh that is any and |
---|
0:19:40 | all money so you need some sort of a anomaly detector and usually would be |
---|
0:19:44 | some sort out lie detector and if it is an outlier then hopefully it will |
---|
0:19:47 | affect the action so you will not but for what you would normally performed |
---|
0:19:56 | no in a complex systems like uh a video system tennis video system you need |
---|
0:20:02 | to model like this big every model okay |
---|
0:20:07 | many of these modules are dealing with the multiclass problems so you don't have just |
---|
0:20:12 | a single |
---|
0:20:14 | hypothesis you have multiple hypothesis which is also introduced in the interest in complexity the |
---|
0:20:21 | into the equation you have a |
---|
0:20:26 | many levels of course in and some of these models are delay in a weighted |
---|
0:20:33 | high level information they have that down uh using contextual information and uh so although |
---|
0:20:40 | they may be interpreted the same sort of a have and they will be using |
---|
0:20:45 | different sources of information and so all these uh complexities are somehow not cultivate indicated |
---|
0:20:53 | weighted by these dimensional anomaly detection uh model so already mentioned so this the list |
---|
0:21:02 | of things so we have multiple models not just a single white with two hypotheses |
---|
0:21:07 | model |
---|
0:21:09 | importantly in a much in perception |
---|
0:21:13 | very often we use discriminative approaches rather than generically if using discriminative approach you cannot |
---|
0:21:20 | really talk about outliers because you just know whether things on the right side of |
---|
0:21:24 | the boundary on all but the you have completely lose the uh every idea of |
---|
0:21:32 | that the observation which the which are trying to classify as an outlier on all |
---|
0:21:38 | is lost the uh to the system so um and if you wanted to detect |
---|
0:21:44 | a normally |
---|
0:21:46 | you would need to use both discriminative models get better performance but also maintain a |
---|
0:21:52 | generative model to know what's going on whether you are actually competent to make that |
---|
0:21:57 | decision |
---|
0:21:59 | uh you have very often areas in the observation space where you have a genuine |
---|
0:22:07 | ambiguity now give a genuine on but then the decisions you make you make in |
---|
0:22:14 | uh you have to be very careful about the menu can not necessarily interpret them |
---|
0:22:18 | as kind of money because you are you have a ambiguous situation you cannot have |
---|
0:22:23 | confidence that it's going to be an anomalous observation |
---|
0:22:28 | contextual reasoning already mentioned that the uh |
---|
0:22:32 | existing systems are not ready yet to deal with that and hierarchical representation |
---|
0:22:39 | about the two more things uh data quality you need to know whether the observation |
---|
0:22:47 | data you wanted and weighted is of the same quality as the data with the |
---|
0:22:53 | page the system has been designed you know that you make certain assumptions about the |
---|
0:22:58 | quality of the data any that quality changes then |
---|
0:23:02 | you the system has to decide if you differentiate between that situation and uh because |
---|
0:23:10 | it would be starting making errors okay and the anomalous situation where you if you |
---|
0:23:17 | have good quality data can be pretty confident that if something is the image then |
---|
0:23:22 | that it's going to be anonymous so the observation |
---|
0:23:27 | and uh |
---|
0:23:29 | more the boolean because it's a very often one |
---|
0:23:34 | introduced is uh |
---|
0:23:36 | a potential one another situation |
---|
0:23:40 | by uh |
---|
0:23:44 | you'll interpretation process because you want make that process to be as fast as possible |
---|
0:23:48 | so for instance if i am interested in object recognition and i know there is |
---|
0:23:53 | uh i don't know half a million objects |
---|
0:23:57 | right at hundred thousand objects you look at the various names and dictionary whatever it |
---|
0:24:03 | would be completely foolish to have a system which can interpret and very single object |
---|
0:24:09 | from that hundred thousand one place so you would the room that leads to something |
---|
0:24:14 | manageable and hopefully we'll deal we just uh i don't have it and the hypothesis |
---|
0:24:19 | on the list and all than a hundred thousand and that if you do that |
---|
0:24:23 | then you may observe something which is an autonomous but by your decision because you |
---|
0:24:28 | have actually simply by the system goes uh processing strategy is and making the assumption |
---|
0:24:36 | that the object will come only from this subset you yeah and if it doesn't |
---|
0:24:41 | then you should be able to detect it and recognise it and to do something |
---|
0:24:46 | about so you can then inject more hypotheses into the system uh if the none |
---|
0:24:51 | of the existing hypotheses is uh to get |
---|
0:24:56 | so |
---|
0:25:00 | i talked about the deficiencies of or not normal anomaly concepts and just to show |
---|
0:25:06 | you more examples of the different nature all but not on the model situation so |
---|
0:25:12 | very often |
---|
0:25:14 | one is ask uh to solve the problem of spotting the difference okay so you |
---|
0:25:19 | can consider it also as a on the money detection problem so in this particular |
---|
0:25:24 | situation we have a nice a nice little object and that i think everybody cans |
---|
0:25:31 | for the difference is a head of a cat hopefully or something uh in the |
---|
0:25:38 | second picture are there any other animals |
---|
0:25:44 | very good yeah |
---|
0:25:46 | uh so this object has slightly different like uh angle any other |
---|
0:25:54 | yeah and the little bit shifted very good so we are very good on the |
---|
0:25:58 | money detectors |
---|
0:26:02 | but the uh the first instance was not all that will be is that all |
---|
0:26:07 | these uh the other animal is represent about the you know very simple uh comparison |
---|
0:26:13 | uh and four that's a computer systems are extremely good uh able to detect uh |
---|
0:26:19 | the dependencies and the you can uh in well okay so that's uh that's one |
---|
0:26:27 | example you have we already talked about distribution drape you talked about mobile the innovations |
---|
0:26:35 | anyway what about the this case |
---|
0:26:41 | are there any other monies |
---|
0:26:52 | well actually there are no differences the only difference is for maybe actually what to |
---|
0:26:57 | observe an image of a very acute vision uh what you jobs uh is the |
---|
0:27:02 | difference in uh information about the second image has been compressed data okay so you |
---|
0:27:10 | lose a little bit of a high frequency information but uh so obviously the compression |
---|
0:27:16 | introduces an obvious and if i have a on the money system which is to |
---|
0:27:21 | detect independence is that based on the sums of assume distribution and uh suddenly the |
---|
0:27:28 | noise characteristic change then uh you know is that difference not so this should not |
---|
0:27:34 | be detected as a normal is so big that quality is an extremely important concept |
---|
0:27:40 | in the in the process |
---|
0:27:43 | already talked about the |
---|
0:27:47 | uh contextual information and the or and hierarchical representation speech also exploit contextual information and |
---|
0:27:55 | uh so you know here |
---|
0:27:58 | every object in this image which is famous painting uh |
---|
0:28:04 | make sense is able to find about the relationship of these objects is the obviously |
---|
0:28:11 | unusual because you would not expect the locomotive to be jumping out of the fireplace |
---|
0:28:17 | and the uh so |
---|
0:28:20 | uh it's another example of the type of anomaly that you would like to be |
---|
0:28:25 | able to detect and |
---|
0:28:27 | explored and the system should be exploited so this is the conventional system that uh |
---|
0:28:35 | people have been using them almost four hundred years and um |
---|
0:28:40 | and this is probably what we need okay so |
---|
0:28:46 | the difference between that well this is the actual functioning system which is uh implement |
---|
0:28:50 | in some applications uh this just uh is the same thing is the blue box |
---|
0:28:55 | which has sensor and the actions alignment |
---|
0:29:01 | when ten okay |
---|
0:29:04 | the difference between this and that is that we have a probably multiple hypotheses of |
---|
0:29:10 | hypotheses the for each uh module okay and the or so we have probably several |
---|
0:29:18 | layers of interpretation not just a single layer we sure uh |
---|
0:29:24 | yeah so the high less would be using context and uh so that is the |
---|
0:29:28 | relationship between those players uh so you then need if you want to the text |
---|
0:29:34 | on the money in a sensible way you then need the following you need something |
---|
0:29:39 | that deals with the differences between contextual or non contextual processing |
---|
0:29:44 | and that that's a soap incongruence detector okay so uh yeah which is so if |
---|
0:29:52 | you have an object if i go uh back to my |
---|
0:30:01 | good really uh if i go here |
---|
0:30:07 | if i and this is my scene graphs or something estimation and in principle i'm |
---|
0:30:13 | uh trying to interpret every object okay but we know that i am interpreting one |
---|
0:30:17 | object uh in the to get off then i'm used in the contextual information provided |
---|
0:30:23 | by other objects so in principle you can uh you are interpreting that object in |
---|
0:30:28 | two different ways possible just using the measurement information relating to that object |
---|
0:30:34 | and secondly you use the measurement information and possibly prior knowledge about the configuration of |
---|
0:30:42 | one or contextual information provided by the neighbours which are will have impact on the |
---|
0:30:48 | interpretation of the subject so we have soft contextual and non contextual |
---|
0:30:53 | in the presentation and you can be measured in then continuance between those two |
---|
0:30:59 | uh |
---|
0:31:03 | but we need to other things |
---|
0:31:05 | we need to assess battle or do not actual one and for the contextual one |
---|
0:31:12 | uh whether we have any but we are dealing with ambiguity so what how much |
---|
0:31:16 | confidence we actually have in the interpretation that we are making so that's a one |
---|
0:31:23 | of the things that the needs to be i did in addition to incongruent uh |
---|
0:31:27 | we need to a module which is a seen data for the because that module |
---|
0:31:32 | tells us whether we really should be |
---|
0:31:36 | looking for a normally sober that even if you'd the text something spurious uh whether |
---|
0:31:41 | we should consider it as a normally because if the data quality has changed then |
---|
0:31:47 | we should not be uh |
---|
0:31:50 | simply saying well it's anomalous situation because so uh yeah the |
---|
0:31:56 | incorrect decisions so what about the change that will be induced by uh data of |
---|
0:32:02 | different quality uh well we should be you know and the |
---|
0:32:09 | and in addition to all that we need to the east and that |
---|
0:32:14 | uh anomaly detection process is the outlier detection process is because even if my non |
---|
0:32:22 | contextual and contextual decision making process is a uh |
---|
0:32:28 | functioning well and uh to function well they would be probably based on the stigma |
---|
0:32:33 | not body models then i will need |
---|
0:32:37 | some way of method deciding whether the observations a on the models are not whether |
---|
0:32:43 | they are outliers so i still need to the conventional model okay of undermining so |
---|
0:32:48 | that can see that these two blocks are the cable uh non contextual and contextual |
---|
0:32:54 | process |
---|
0:32:56 | but hopefully i will not be using them very often because if i did lana |
---|
0:33:01 | the system would just the be computationally complex so uh |
---|
0:33:07 | ideally what uh you would like to do is to |
---|
0:33:12 | bros processing in these modules looking for our model is only when you want to |
---|
0:33:19 | get to do so and this the to get in can be done quite efficiently |
---|
0:33:22 | why this incongruence detection process |
---|
0:33:27 | now |
---|
0:33:28 | can see that one of the mechanisms and only one there are others uh in |
---|
0:33:34 | uh |
---|
0:33:35 | the system that we need for detecting a normally scene perception systems is uh incongruence |
---|
0:33:41 | detect that and interestingly uh the work which uh well one of the original work |
---|
0:33:49 | in this area uh was running speech area uh |
---|
0:33:55 | i don't know whether actually brno was involved in this or more uh was it |
---|
0:34:00 | was just one of yours |
---|
0:34:02 | okay yeah so you work with the hynek hermansky and um work on the problem |
---|
0:34:09 | all the out-of-vocabulary what detection which is exactly the sort of a big a typical |
---|
0:34:15 | example of the problem we are dealing with you may have a uh you have |
---|
0:34:18 | a at least player speed a system which is processing data uh detecting phonemes so |
---|
0:34:27 | we have non contextual interpretation and contextual which combines the phonemes in words and you |
---|
0:34:33 | may be interested in detecting and whether there is any anomaly and that would be |
---|
0:34:37 | an or more like if for instance the phoneme detector functions that very well gives |
---|
0:34:42 | you very strong confidence in the interpretation but uh the |
---|
0:34:48 | word-level interpretation of police is garbage and it reduces got it's simply because the word |
---|
0:34:53 | doesn't exist in the dictionary |
---|
0:34:56 | so this is the no example of the situation uh that uh we would like |
---|
0:35:01 | to detect and the there was a five year project direct project funded by the |
---|
0:35:08 | U which is uh as being extending this basic idea to the image domain |
---|
0:35:16 | and the and also continued with application in speech and uh so that was uh |
---|
0:35:21 | but also by will get which it was then uh extending this work uh and |
---|
0:35:28 | the most of the other work which are the definitely want role is uh |
---|
0:35:34 | this name it yet the publications was published in the subsequent about two thousand and |
---|
0:35:39 | i two thousand and well so |
---|
0:35:45 | this is a little bit on the background about as i say is not directly |
---|
0:35:50 | focus and finally on the incongruent so detection how do you uh the fact that |
---|
0:35:58 | there is a difference between sort of a generic and the specific classifiers generally be |
---|
0:36:04 | in uh non contextual one uh well depends on the application about the |
---|
0:36:12 | and the if uh what is the implication of uh detecting such incongruence so that's |
---|
0:36:18 | uh what dialogue has produced but maybe actually try to use this in a only |
---|
0:36:25 | work on the tennis video interpretation it was not you know what the very citizen |
---|
0:36:30 | fine mention be a very open dealing with situations where the decisions but ambiguous and |
---|
0:36:35 | then you would not a bit on from that come from that you want but |
---|
0:36:40 | and with a normal situation we dealt with situations and we'll see that in a |
---|
0:36:45 | minute that the uh we had several videos of pennies and the |
---|
0:36:53 | even several videos of any single they all had a different chord to the from |
---|
0:36:58 | different tournaments so uh they had the uh the recorded in different conditions and uh |
---|
0:37:05 | some of them but noisier than others and that it was pretty a that you |
---|
0:37:09 | need to know something about data quality if you want uh to make a sensible |
---|
0:37:15 | uh decisions about on the money we still need it the basically the original uh |
---|
0:37:23 | technology so to speak of a normally detection so how by detection proces and uh |
---|
0:37:28 | so i think these were or right and what do they monitoring also is needed |
---|
0:37:33 | to measure whether distributions of shifted |
---|
0:37:39 | no wit is uh |
---|
0:37:42 | architectural system that is the state it it's a quite interesting because you can then |
---|
0:37:49 | based on the various uh |
---|
0:37:52 | uh |
---|
0:37:54 | on the outcomes or on the analysis of the uh the various modules in that |
---|
0:37:59 | anomaly detection system you can then a classifier you anomalies or situations yeah and they |
---|
0:38:06 | recognise different states so we can definitely recognise the state when you have no anomaly |
---|
0:38:11 | but you can also uh identify situations when you are dealing with an unknown up |
---|
0:38:17 | with noisy measurements you can uh the text situation that you have unknown objects uh |
---|
0:38:23 | when you have an incongruent or congruent labeling so all the various a space of |
---|
0:38:29 | uh nobody can be detected and to you get much better idea of what's going |
---|
0:38:33 | on |
---|
0:38:34 | so ideally actually what we want to do is to start with ten days and |
---|
0:38:40 | move on to badminton and uh do uh detector or identify with the modules that |
---|
0:38:49 | will not have competence to well on the input data and uh try to correct |
---|
0:38:57 | the module so i don't then all inject knowledge so that the we can actually |
---|
0:39:02 | use the system volume application |
---|
0:39:05 | but the |
---|
0:39:09 | the wise you started something very simple and as i said just switching from singles |
---|
0:39:15 | tennis doubles so very simple situation so if you consider that problem then |
---|
0:39:21 | what would you expect |
---|
0:39:24 | first of all |
---|
0:39:26 | in doubled there are twice as many players |
---|
0:39:29 | that's yeah but the cold that is being used for the game is a wider |
---|
0:39:36 | so you have also the time lines which can uh |
---|
0:39:41 | which are illegal basically in the case of singles about in the case of doubles |
---|
0:39:46 | of uh they are more and the but everything else stays the same the rooms |
---|
0:39:53 | are the same that was that was quite a nice the |
---|
0:39:57 | uh |
---|
0:39:59 | challenge because it was not too complicated about the at the same time why the |
---|
0:40:04 | interesting to see what's going on and uh okay now in principle you would say |
---|
0:40:09 | well it's obvious well can just count the players and the drop is done about |
---|
0:40:14 | the impact is anybody who works and you or working in on images or video |
---|
0:40:21 | you know that the tech T and count been objects it's not as simple as |
---|
0:40:25 | that uh well lee because |
---|
0:40:31 | the vision process is are not perfect but partly because the uh application domain allows |
---|
0:40:41 | basically |
---|
0:40:43 | uh |
---|
0:40:44 | well this is not the use of a black and white so we speak about |
---|
0:40:49 | the it's not either two or four in the game but the there are other |
---|
0:40:53 | moving objects so you have line charges for instance and normally this tells us they |
---|
0:40:57 | still and when you uh do the most i can then use of uh they |
---|
0:41:02 | stay in the image about that sometimes they move okay and if they move they |
---|
0:41:07 | suddenly become moving object and uh then unless you have some sophisticated mechanism of distinguishing |
---|
0:41:15 | between players and other moving objects then you are stuck with the different count then |
---|
0:41:21 | you have more balls okay so the se is played and it goes out and |
---|
0:41:28 | the more boy runs collectible and uh so you have somebody five |
---|
0:41:34 | object detected that so if you actually look at and the statistics of a video |
---|
0:41:41 | okay uh not just the then uh this is what you would to the observed |
---|
0:41:46 | for singles okay so most of the time you would the detect just to plan |
---|
0:41:51 | to agents movie nations about the we in the many occasions uh you detect a |
---|
0:41:57 | human on and uh sometimes up to five so we have a distribution and equally |
---|
0:42:03 | for doubles uh you have a distribution so you have two sets of this the |
---|
0:42:06 | uh |
---|
0:42:08 | you look on the money on the basis of distributions rather than single observations but |
---|
0:42:15 | anyway so we are basically trying to differentiate between uh |
---|
0:42:21 | two distributions one which is a modal distribution and one which is of the distribution |
---|
0:42:26 | and look for differences and that anyway so that's uh what we have a downer |
---|
0:42:31 | which is a source standard approach and here we have some uh |
---|
0:42:37 | not the results but the data that we use so we have can see we |
---|
0:42:41 | have five videos uh of different length so they are not necessary or complete much |
---|
0:42:47 | is about the white it doesn't that they all of a different situation so we |
---|
0:42:52 | have uh australian uh japan tournament and us women and men single doubles and these |
---|
0:43:03 | are the numbers of the place and um |
---|
0:43:10 | and here we have some results okay so what we show here body to you |
---|
0:43:18 | as we are comparing distributions if you are using an into information just from one |
---|
0:43:23 | short then this will give you the performance that you would get |
---|
0:43:28 | for various scenarios okay and the uh basically uh here we are talking about the |
---|
0:43:36 | detection of under forty so uh we train on singles and when i talk about |
---|
0:43:43 | a normally i'm or there's S you mean that any training that is done is |
---|
0:43:50 | or was down in the norm a normal situation there are many cases where people |
---|
0:43:54 | are actually trying to synthetic pretty uh genetic on the monies create animal is and |
---|
0:44:01 | the uh but i think it's fundamentally wrong approach because that if you uh design |
---|
0:44:08 | a system you cannot possibly collect data or a normal situation for the idea uh |
---|
0:44:14 | and well then they would just becomes of new classes and the so the really |
---|
0:44:20 | this they're the appropriate the way of thinking about it is that you cannot train |
---|
0:44:24 | the system only with the norm on the most data and so order training was |
---|
0:44:30 | done only on singles we measured the level of noise and you can see for |
---|
0:44:35 | instance that the was thirty and uh men single pay that much lower high noise |
---|
0:44:42 | then uh the other two and uh and that uh |
---|
0:44:47 | uh |
---|
0:44:49 | has a serious implication because if you look at the data |
---|
0:44:54 | you can see that the if you train or no uh so here we have |
---|
0:44:59 | information okay here we trained on the uh australian women singles and japan single okay |
---|
0:45:06 | so you can see that the |
---|
0:45:10 | if you train on the uh good quality data and then you try to uh |
---|
0:45:17 | that's the system with the data of different quality then you have problems in you |
---|
0:45:21 | can see that from this guitar because this is basically the unwanted detection output or |
---|
0:45:27 | the single was so we should not be detecting any animal is because the art |
---|
0:45:31 | doesn't dealing with the same domain the system was trained to but uh to recognise |
---|
0:45:36 | the right interpret the tennis singles and here we are actually having a problem because |
---|
0:45:43 | the course of the noise condition uh we are uh detecting force anomalies uh right |
---|
0:45:50 | is that when we actually use the trained on data which is a little bit |
---|
0:45:54 | more noisy than that |
---|
0:45:57 | not all the best uh singles throws any animal is about the uh then we |
---|
0:46:02 | have to do a little bit more integration to get actually the results uh the |
---|
0:46:08 | unwanted direction di can correctly so that also shows you that the uh |
---|
0:46:14 | one is to be very careful about data quality and you just implications on the |
---|
0:46:18 | on the money detection process |
---|
0:46:21 | uh the second to the task was to well the second on the money that |
---|
0:46:27 | can analyze is that the ball goes out in the time lines and |
---|
0:46:32 | okay and the U |
---|
0:46:37 | so the gain should terminate |
---|
0:46:40 | but it doesn't just got it on and uh |
---|
0:46:44 | again we have developed a so what do we have well we use |
---|
0:46:51 | uh had be very careful to make sure that the a normal role in us |
---|
0:46:58 | on the models out who uh situations where uh which may genuinely ambiguous and because |
---|
0:47:05 | of the data in on the system itself anything very close to the boundary line |
---|
0:47:10 | between the timeline and the single school was on the models but the further away |
---|
0:47:16 | you got from that the remote the from the boundary line you have more confidence |
---|
0:47:19 | so we have values into this a confidence measure |
---|
0:47:23 | we as a filter to make sure that we are not trying to make uh |
---|
0:47:28 | decisions about on the money uh on data which is by its very nature i |
---|
0:47:33 | don't the ambiguous |
---|
0:47:38 | coming back again to my point that we are always using only the information that |
---|
0:47:45 | you acquire obtain in the local a problem but uh normal source norm and the |
---|
0:47:50 | model souls and uh so basically |
---|
0:47:54 | and the interpretation and the interpretation process associated with it so we have not really |
---|
0:48:00 | designed the system simply to detect the specifically on the money sits do in normal |
---|
0:48:06 | processing and the uh detecting on the monies as a result of that and the |
---|
0:48:12 | anyway |
---|
0:48:14 | this is a just um an illustration of uh of the interpretation process in the |
---|
0:48:21 | so when there is a perceived |
---|
0:48:24 | uh there are okay well as when the system should that i mean a and |
---|
0:48:28 | actually the game continues we are uh follow in all the possible interpretations uh all |
---|
0:48:34 | the possible a interpretation possible it may happen and the uh on the basis of |
---|
0:48:40 | a that we are able to make a decision whether uh there is a no |
---|
0:48:44 | money because the game continues uh without uh bases and the two |
---|
0:48:53 | the detection is based on measuring incongruence between |
---|
0:48:57 | uh contextual a non contextual uh playgirl's basically so we have our event detection which |
---|
0:49:04 | is uh give you know so non contextual labels and we have the context of |
---|
0:49:08 | course in which takes into account the sequences uh of events over time so as |
---|
0:49:15 | uh |
---|
0:49:17 | as this are normally the case you have basically as i already explained you have |
---|
0:49:21 | two interpretations one which is contextual non contextual and you have to measure whether they |
---|
0:49:27 | are incongruent |
---|
0:49:28 | and one possible way of measuring it is using solve a bayesian surprise measure which |
---|
0:49:34 | is the form of a divergence on a discrete distributions of labels about the problem |
---|
0:49:41 | with that the measure is that uh it's very sensitive if you have a uh |
---|
0:49:47 | a probability which moves from point ninety five one then a suddenly you move into |
---|
0:49:55 | infinity and it the course this the hubble and uh so we have actually adapted |
---|
0:49:59 | that mention and to use the something which was a practically a much more efficient |
---|
0:50:06 | so we chose the top label the most uh the best supporting label for each |
---|
0:50:13 | of the contextual or non contextual hypotheses and just measure the difference between those two |
---|
0:50:18 | and you can actually show that in the two class case that we consider in |
---|
0:50:23 | this particular uh application whether the ball was out not uh we uh it ended |
---|
0:50:30 | up with a very simple way of measuring an incongruence between the states and when |
---|
0:50:36 | we did that on the videos that we trained with also we trained on single |
---|
0:50:42 | us on a single as we had no anomalies detected so no problem as you |
---|
0:50:47 | would expect and then on doubles uh well with the current system whatever limitations it |
---|
0:50:54 | has to be certainly detected some anomalies |
---|
0:50:58 | many where undetected uh not many but is more number of false positives and then |
---|
0:51:07 | you associate the anomalies with the and you have a cold where they happen |
---|
0:51:13 | they identified that reminds so it was very nice and that was very easy then |
---|
0:51:19 | use that association and we have another paper elsewhere uh which uh |
---|
0:51:26 | and then takes the output of this uh of this module of this anomaly detection |
---|
0:51:32 | module and through this association is able to but what define the rule based basically |
---|
0:51:38 | say well the court remove the animal is the cold size has to change and |
---|
0:51:45 | it has to use that reminds us to uh to be able to in that |
---|
0:51:50 | discontent successfully so you know eight |
---|
0:51:54 | i think i |
---|
0:51:56 | talked about i'll give you examples of all the mechanisms that the rainy day for |
---|
0:52:02 | anomaly detection and but exercise by application uh principle you need this context detection which |
---|
0:52:08 | is about domain detection a rather than a uh real or complex uh for system |
---|
0:52:15 | to acquire new competence and once it has then it has to be able to |
---|
0:52:19 | pick out which uh domain it's to do it but in a bit and that |
---|
0:52:25 | the take the appropriate knowledge base and uh this is the basic system is used |
---|
0:52:30 | in the interpretation that way role of a high level and the this is the |
---|
0:52:34 | anomaly detection mechanism but that's the module that uh S is still need it and |
---|
0:52:40 | that would be added to the system to |
---|
0:52:44 | lexus successfully so that brings me to conclusion i hope that i have a display |
---|
0:52:50 | did you that uh i know what detection in machine perception requires more mechanism then |
---|
0:52:56 | what is normally what is just over the body conventional model and the and what |
---|
0:53:02 | these mechanisms are and how useful in practical applications thank you very much attention |
---|
0:53:40 | yeah |
---|
0:53:50 | the use a system |
---|
0:53:52 | well i think the you know what goes into the anomaly detection system i think |
---|
0:53:57 | it's genetic about the application was specific okay so obviously are solutions will not work |
---|
0:54:04 | for your problem about the i think of one uh the notion of data quality |
---|
0:54:10 | is very important and the also the approach the problem that one needs one should |
---|
0:54:17 | be trying to train the system just with the normandy time but it's you all |
---|
0:54:25 | you also mulch within yourself in the foot because if you have examples of on |
---|
0:54:28 | the money then it would help you to improve the design of the nevertheless uh |
---|
0:54:33 | you know system then it will be able just to detect the what you presented |
---|
0:54:38 | to it you in training and uh and so there is a little bit of |
---|
0:54:42 | a dynamo yeah |
---|
0:54:53 | i |
---|
0:55:12 | okay uh |
---|
0:55:14 | basically i think in all the protocol that all the videos that the use of |
---|
0:55:18 | uh from professional matches and the cameras with fixed but any okay this is why |
---|
0:55:24 | we needed to do the most like uh detection with section um |
---|
0:55:32 | in principle |
---|
0:55:36 | at least we always use the prior information that this the ground plane so you |
---|
0:55:41 | need to based on the information you can solve a calibrate the comment on expect |
---|
0:55:47 | to the scene of the speech and uh so uh you doesn't have to it |
---|
0:55:52 | can move in it is not the solution is not just for a single position |
---|
0:55:57 | of the common uh you can always uh contrary the system for any position and |
---|
0:56:02 | this is what actually happens when a remote uses them |
---|
0:56:19 | yeah |
---|
0:56:32 | i think it was more to do with access uh i think the video speech |
---|
0:56:38 | we go around the through internet ordering to internet maybe unique go but we didn't |
---|
0:56:43 | looking into it uh but we knew that uh it would be difficult to get |
---|
0:56:47 | the copies of the same but on broadcast |
---|
0:56:50 | although we have a one of two with that B C so |
---|
0:56:56 | a game and uh yeah |
---|
0:57:10 | it that uh it's not regulate and i think that would say that we have |
---|
0:57:15 | maybe they are losing probably for the confidence measure we are probably losing half of |
---|
0:57:19 | the ten timeline |
---|
0:57:21 | uh the way we are not making decisions because uh |
---|
0:57:27 | the ambiguity and i'm because we can accuracy of the system and it actually gets |
---|
0:57:31 | less is for the part of the core okay because the further away from the |
---|
0:57:37 | comment often a degraded in accuracy |
---|
0:57:42 | information |
---|
0:58:03 | well i'm i hope it will generate some other one is but uh that's an |
---|
0:58:07 | interesting proposition |
---|
0:58:35 | thus |
---|
0:58:41 | for |
---|
0:58:44 | which |
---|
0:58:47 | and |
---|