0:00:15 | i |
---|
0:00:15 | with and |
---|
0:00:16 | present them at that is and you know |
---|
0:00:19 | i |
---|
0:00:20 | to detect you mouth |
---|
0:00:21 | just |
---|
0:00:22 | so i |
---|
0:00:23 | of my presentation is just for that |
---|
0:00:26 | first i read explain why from our point of view the detection of humans |
---|
0:00:29 | a just streams |
---|
0:00:31 | is a binary now see that classification problem |
---|
0:00:34 | and then we will design a new robust probably stay |
---|
0:00:37 | it's so this approach |
---|
0:00:38 | classified |
---|
0:00:38 | see wet |
---|
0:00:40 | and i and i will present the that that's that's we used assess S or method |
---|
0:00:44 | the results |
---|
0:00:45 | and i really give a short conclusion |
---|
0:00:49 | uh uh so that's a by the definition of the problem of detection |
---|
0:00:53 | a few meant to be just straight |
---|
0:00:56 | so that taking you humans in video streams is useful for many applications |
---|
0:01:00 | such as a video surveillance |
---|
0:01:01 | but the at we just send an is |
---|
0:01:04 | oh to ten man |
---|
0:01:06 | are |
---|
0:01:06 | the goal is to detect a only you minds and the sink guess |
---|
0:01:10 | so the corner stone of such a |
---|
0:01:12 | an application |
---|
0:01:13 | is the ability to classify the observation |
---|
0:01:17 | in two class |
---|
0:01:18 | that are your man and you my |
---|
0:01:22 | a first approach to detect humans has been proposed for still images |
---|
0:01:27 | however such an approach has a of drawbacks |
---|
0:01:30 | it's based on the appearance |
---|
0:01:32 | and usually the appearance |
---|
0:01:33 | a few months |
---|
0:01:34 | is a pretty table because uh |
---|
0:01:36 | colours and textures |
---|
0:01:38 | a uh |
---|
0:01:40 | uh |
---|
0:01:41 | very |
---|
0:01:43 | also a lot of detection windows those have to be considered |
---|
0:01:46 | a image because we have to look for in months of the lot of positions and that a lot of |
---|
0:01:50 | states |
---|
0:01:52 | so um |
---|
0:01:53 | the consequence is that it is difficult to obtain a low false alarm rate per image |
---|
0:01:59 | a better press and she's in using a background subtraction and right |
---|
0:02:04 | this ones |
---|
0:02:05 | uh i |
---|
0:02:06 | struck the C of the users |
---|
0:02:07 | and the a moving objects in the scene |
---|
0:02:10 | thus |
---|
0:02:11 | uh we take advantage of that on prior information present in the you video streams |
---|
0:02:17 | and and D um |
---|
0:02:20 | the um decision is based on geometric information |
---|
0:02:24 | also on the a few close have to be considered |
---|
0:02:26 | so it's possible to obtain a lower false alarm rate |
---|
0:02:30 | but match and then with the approach based on |
---|
0:02:33 | the image |
---|
0:02:35 | so now we will design |
---|
0:02:37 | a probabilistic because of this approach |
---|
0:02:39 | a classify C where |
---|
0:02:43 | when we in |
---|
0:02:44 | i design our method we took into account |
---|
0:02:46 | the ours |
---|
0:02:47 | because we want to |
---|
0:02:48 | to get a robust method |
---|
0:02:51 | so when a background subtraction is followed by a a connected components and i is |
---|
0:02:55 | see let's make present defect |
---|
0:02:58 | first |
---|
0:02:59 | then may present |
---|
0:03:00 | um |
---|
0:03:01 | noise controls or or you do not |
---|
0:03:04 | but |
---|
0:03:05 | also um |
---|
0:03:07 | the sequence of |
---|
0:03:09 | a several users or or or a moving object |
---|
0:03:12 | could be match |
---|
0:03:13 | and last but not least um |
---|
0:03:15 | carried object and shot those could also be detected in the foreground |
---|
0:03:19 | so we need to have a a robust |
---|
0:03:22 | uh a description technique |
---|
0:03:25 | oh what that it the option a description technique exist |
---|
0:03:28 | and are may need to treat a yeah but can be used to choose the um |
---|
0:03:33 | description technique most adapted |
---|
0:03:35 | to uh what we need |
---|
0:03:38 | so the first criterion |
---|
0:03:40 | uh is related to the use of |
---|
0:03:43 | uh uh the rip points or to entirely or of the seal |
---|
0:03:48 | but you are you can that this that |
---|
0:03:50 | a a a a a and every image is the noise a fix the controls |
---|
0:03:53 | so a region based methods |
---|
0:03:55 | is |
---|
0:03:56 | less sensitive to noise and is preferable |
---|
0:03:59 | no other criteria is |
---|
0:04:01 | read it it to um |
---|
0:04:03 | the use of a bottle or look at the uh |
---|
0:04:06 | attribute |
---|
0:04:08 | with is a global that's |
---|
0:04:09 | the attribute |
---|
0:04:11 | um are we need it to do a whole shape |
---|
0:04:13 | that for |
---|
0:04:14 | um |
---|
0:04:16 | and the |
---|
0:04:17 | what the attributes are a to it |
---|
0:04:19 | by |
---|
0:04:20 | the presence of defects |
---|
0:04:21 | in the ceiling |
---|
0:04:22 | look at it that's |
---|
0:04:23 | speech shapes into smaller components |
---|
0:04:26 | with the hope to limit the influence of the to a few components |
---|
0:04:30 | so |
---|
0:04:31 | uh a in our case a region based the catfish description made that is prefer |
---|
0:04:39 | or also with they should and mit that is the simplest region based that at the spectrum it that one |
---|
0:04:44 | can image a it's the set of all pixels include in the show |
---|
0:04:47 | but the question is how can you justify such a set |
---|
0:04:51 | unfortunately there it doesn't exist any motion on them at the same for this task |
---|
0:04:55 | so we will do this by or |
---|
0:05:02 | in our our at each pixel that plays the role of next |
---|
0:05:06 | in the first stage |
---|
0:05:07 | each expert |
---|
0:05:08 | this site |
---|
0:05:09 | if it i |
---|
0:05:10 | the |
---|
0:05:11 | if |
---|
0:05:11 | is already to peaks set is part of a human silhouette or or not |
---|
0:05:15 | and we also assume that it gives the probably for it's this decision to be a |
---|
0:05:21 | um um a matter of |
---|
0:05:22 | the experts can be implemented by machine on methods |
---|
0:05:27 | in the second stage yep in is given by the expert |
---|
0:05:30 | our males |
---|
0:05:31 | by by weighted it what image is |
---|
0:05:34 | and the weight given to an expert |
---|
0:05:35 | depends on the probability for this region to be correct |
---|
0:05:39 | indeed |
---|
0:05:40 | intuitively T we want that un |
---|
0:05:42 | a confidant expect |
---|
0:05:44 | has an important weight in the few not decision |
---|
0:05:49 | so here is a a a a small example at to to understand them at that |
---|
0:05:53 | so |
---|
0:05:54 | it's |
---|
0:05:54 | X |
---|
0:05:55 | is a fix a and they produce |
---|
0:05:57 | is the probability for a it's to |
---|
0:06:00 | be part of a you human seem |
---|
0:06:02 | that's the the information given by the expert |
---|
0:06:06 | so |
---|
0:06:06 | in this example are six six express are used to classify the way |
---|
0:06:11 | each of them takes a look around here and based on the subset summation |
---|
0:06:16 | it gives the probability for a it's to be part of a human silhouette |
---|
0:06:21 | and then all this information are mapped into the few not decision that is |
---|
0:06:26 | that |
---|
0:06:27 | see what this you |
---|
0:06:31 | so to implement the experts we have used a or full uh machine learning need |
---|
0:06:37 | and |
---|
0:06:37 | except trees |
---|
0:06:40 | oh it's a it so that doesn't require you to optimize and parameter |
---|
0:06:45 | and i so that of all |
---|
0:06:46 | intrinsic a uh over fitting |
---|
0:06:50 | it's a like |
---|
0:06:52 | a a of decision trees |
---|
0:06:54 | and in our data each tree use of votes for one class you man on a human |
---|
0:06:59 | so we denote know it that i is the proportion of trees voting for the class you man |
---|
0:07:04 | and chris and then my there's |
---|
0:07:06 | respectively that don't ten amount of you man and then the man errors |
---|
0:07:09 | in the long set |
---|
0:07:12 | and |
---|
0:07:13 | we propose the following estimator for the probability |
---|
0:07:17 | uh of |
---|
0:07:19 | the |
---|
0:07:19 | set |
---|
0:07:20 | to be should from a humans you with |
---|
0:07:23 | so um |
---|
0:07:24 | one the learning set |
---|
0:07:25 | balance |
---|
0:07:27 | the probability uh is |
---|
0:07:29 | approximately equal to the proportion of trees voting for the class and |
---|
0:07:34 | however when the that long that the base is not but i and |
---|
0:07:38 | there is a yes in the decision you can but the trees |
---|
0:07:41 | and this is yeah as to be can so that's what we do in our probably the estimate |
---|
0:07:48 | so |
---|
0:07:49 | once we have a probability uh for each peak that |
---|
0:07:53 | we can compute |
---|
0:07:55 | the decision and they can buy the than D X fair |
---|
0:07:58 | and the probability for a this shouldn't to be can right |
---|
0:08:01 | using base rule |
---|
0:08:04 | also um |
---|
0:08:06 | i say then yeah yeah |
---|
0:08:08 | uh we use of a weighting |
---|
0:08:11 | a weight it um voting rule |
---|
0:08:14 | to give the class |
---|
0:08:15 | to the this right |
---|
0:08:17 | and that's |
---|
0:08:18 | the question number four |
---|
0:08:20 | yeah W value is the weight given |
---|
0:08:22 | to D X |
---|
0:08:26 | so i nine we present |
---|
0:08:28 | the a that that's it's we use to assess or method |
---|
0:08:31 | the results |
---|
0:08:32 | and give |
---|
0:08:33 | i short computer |
---|
0:08:35 | both our make set and testing set |
---|
0:08:37 | a contents you man and then you meant see what |
---|
0:08:40 | then in wrist i to one hundred by one hundred pixels |
---|
0:08:44 | this means that are we make that is |
---|
0:08:47 | scanning valiant |
---|
0:08:49 | and also also that the that can be used with a low resolution images |
---|
0:08:54 | also note that |
---|
0:08:56 | a long and it is not but i |
---|
0:08:58 | but this is not the problem is our probably to estimate of can as the B yes |
---|
0:09:03 | to by |
---|
0:09:05 | so are at the result |
---|
0:09:07 | the images is shows the probability a plus compute it in each peak set |
---|
0:09:13 | um |
---|
0:09:14 | these are probability maps |
---|
0:09:16 | oh what extent response |
---|
0:09:18 | to a probability of one |
---|
0:09:20 | where whereas of the excel corresponds to a probability of zero |
---|
0:09:25 | as you can see |
---|
0:09:26 | you meant see what i'm right to |
---|
0:09:28 | and that's means that our method that works very well |
---|
0:09:33 | uh_huh hmmm |
---|
0:09:36 | one |
---|
0:09:37 | we have computed the probability matrix |
---|
0:09:39 | in X three |
---|
0:09:41 | we also have to assign a weight |
---|
0:09:44 | to each said |
---|
0:09:46 | in fact we try to three different weighting strategies |
---|
0:09:49 | one of them being too large are automatically the weighting function |
---|
0:09:54 | and |
---|
0:09:54 | oh these strategies that's to see our a |
---|
0:10:00 | and then we have a a correct classification rate or one ninety percent for most you man and then you |
---|
0:10:05 | meant to where |
---|
0:10:07 | however for this is something are starting point |
---|
0:10:09 | because |
---|
0:10:11 | uh |
---|
0:10:11 | we did not yet try |
---|
0:10:13 | to optimize |
---|
0:10:14 | the set of attributes |
---|
0:10:16 | used to describe a set |
---|
0:10:18 | and also to describe it and we have to define a neighborhood |
---|
0:10:22 | and we don't try to optimize the neighborhood shape and the neighbourhood size |
---|
0:10:27 | so i believe that but the results |
---|
0:10:29 | be also are it thing with our method |
---|
0:10:33 | so in country and we have proposed a new system for the detection of humans |
---|
0:10:38 | well used it for video streams |
---|
0:10:40 | our approach has been designed to rely on geometric information |
---|
0:10:45 | and to be a robust to not |
---|
0:10:47 | so in a first that we apply a background subtraction noise |
---|
0:10:51 | but like sequence |
---|
0:10:53 | of |
---|
0:10:53 | best sounds and moving objects in the scene |
---|
0:10:56 | then a probabilistic information |
---|
0:10:58 | is computed for each of set in the foreground |
---|
0:11:03 | and finale |
---|
0:11:04 | is information is used to decide was of the sit where is that a for you or not |
---|
0:11:08 | there is show that our approach is promising for the detection of humans months industry |
---|
0:11:14 | but finding the optimal neighborhood used for addition |
---|
0:11:17 | a for the description of a set is left for future work |
---|
0:11:20 | thank you |
---|
0:11:27 | thank you sebastian |
---|
0:11:29 | any question |
---|
0:11:38 | uh a what about a comparison we the whole days |
---|
0:11:42 | uh the best and detection |
---|
0:11:45 | but is so that |
---|
0:11:47 | my first play |
---|
0:11:49 | with a but is |
---|
0:11:50 | you have really uh |
---|
0:11:52 | are you john more of |
---|
0:11:54 | uh detection windows |
---|
0:11:55 | we be considered |
---|
0:11:57 | um |
---|
0:11:58 | and this is |
---|
0:11:59 | about |
---|
0:12:00 | twelve thousand images |
---|
0:12:02 | uh i mean those per image |
---|
0:12:04 | um there for the um |
---|
0:12:06 | a force a lower rate |
---|
0:12:08 | should be multiplied by |
---|
0:12:10 | oh what i on |
---|
0:12:12 | to obtain the false alarm rate |
---|
0:12:14 | per image |
---|
0:12:15 | so it gives a really a um high for time rate right image |
---|
0:12:20 | uh also so there are uh techniques to |
---|
0:12:23 | keep only um a amount of uh response in the images |
---|
0:12:27 | but at least you have a false |
---|
0:12:30 | but detection or image |
---|
0:12:32 | this is not |
---|
0:12:33 | uh |
---|
0:12:34 | acceptable a table for vacation |
---|
0:12:36 | such as video a |
---|
0:12:39 | or but you can apply your or uh uh uh to the whole detect or |
---|
0:12:43 | a descriptor is going be |
---|
0:12:45 | on the uh um |
---|
0:12:47 | on the movie mask |
---|
0:12:49 | on the moving object |
---|
0:12:50 | okay but |
---|
0:12:51 | in this case |
---|
0:12:53 | yeah O and the um |
---|
0:12:55 | oh okay is on computed using colours |
---|
0:12:59 | and |
---|
0:12:59 | um |
---|
0:13:00 | the utterance of few months in videos |
---|
0:13:03 | is |
---|
0:13:04 | and predictable of you can have |
---|
0:13:06 | but of different colours and textures |
---|
0:13:08 | and from our point of view that's preferable to use on the uh geometric information |
---|
0:13:14 | and the temporal information that we have |
---|
0:13:16 | in the video streams |
---|
0:13:17 | i does as |
---|
0:13:18 | to do this |
---|
0:13:20 | that's why we have |
---|
0:13:21 | chosen |
---|
0:13:26 | yes but |
---|
0:13:32 | really |
---|
0:13:33 | a funny question but |
---|
0:13:34 | uh uh so uh |
---|
0:13:36 | i mean based on the shape but uh you you said you want to distinguish between humans and the rest |
---|
0:13:41 | so as so when you put next to these market |
---|
0:13:44 | i mean |
---|
0:13:46 | have you have you to this view my |
---|
0:13:48 | because uh uh for me like |
---|
0:13:50 | something of that has like uh |
---|
0:13:53 | of the same shape |
---|
0:13:55 | will be detected as you right |
---|
0:13:57 | right |
---|
0:13:58 | okay so in the market will be human |
---|
0:14:02 | um um |
---|
0:14:03 | in fact you can uh a longer |
---|
0:14:06 | was |
---|
0:14:06 | uh |
---|
0:14:07 | one keys in the negative |
---|
0:14:09 | a set |
---|
0:14:10 | so that set of nine you man us let's |
---|
0:14:13 | and probably if you don't have a uh |
---|
0:14:16 | two nine two |
---|
0:14:17 | too much nice in i images |
---|
0:14:19 | this will work |
---|
0:14:20 | but |
---|
0:14:20 | uh in real applications |
---|
0:14:22 | they are nice |
---|
0:14:23 | and therefore with to can us small images |
---|
0:14:26 | one hundred probably one hundred |
---|
0:14:28 | that's why guessing uh you're are right on the you will be detected this you my |
---|
0:14:33 | but with a a synthetic images |
---|
0:14:36 | without noise |
---|
0:14:37 | this is possible to distinguish |
---|
0:14:39 | well then you have |
---|
0:14:40 | problems with |
---|
0:14:42 | the close |
---|
0:14:43 | and you you so on |
---|
0:14:44 | okay |
---|
0:14:45 | thank |
---|
0:14:47 | other question |
---|
0:14:50 | you got any assumptions on how the cameras shall be |
---|
0:14:54 | well compared to |
---|
0:14:56 | people |
---|
0:14:57 | um |
---|
0:14:58 | yes indeed |
---|
0:15:00 | when you but you're running set |
---|
0:15:03 | you should um |
---|
0:15:05 | a it with uh |
---|
0:15:07 | see do taken from the same point of view |
---|
0:15:10 | do a real application |
---|
0:15:12 | for example if |
---|
0:15:13 | and the |
---|
0:15:13 | a real application you time right |
---|
0:15:15 | is |
---|
0:15:16 | a above the person |
---|
0:15:17 | and you should |
---|
0:15:18 | a place in your right |
---|
0:15:20 | so |
---|
0:15:20 | see let's they can |
---|
0:15:21 | the |
---|
0:15:22 | on the same |
---|
0:15:24 | but |
---|
0:15:25 | um |
---|
0:15:28 | this |
---|
0:15:29 | in practice |
---|
0:15:30 | not the problems |
---|
0:15:31 | the you meant see words in the long set can be generated with and can be with a a a |
---|
0:15:36 | a a human about to |
---|
0:15:38 | and uh for changing the point of view |
---|
0:15:40 | only a few minutes |
---|
0:15:42 | to compute and |
---|
0:15:44 | and uh |
---|
0:15:45 | related to the first question |
---|
0:15:47 | a got a sense of all these were form compared to |
---|
0:15:51 | uh a train cascade of classifiers |
---|
0:15:54 | been approaches that look at |
---|
0:15:56 | you humans |
---|
0:15:57 | a like humans or |
---|
0:15:59 | you humans |
---|
0:15:59 | a set of parts of the band |
---|
0:16:01 | using a cascade classifier |
---|
0:16:04 | and that it takes a long time to plane but it's not source will we |
---|
0:16:07 | at this point we didn't compare |
---|
0:16:10 | because uh we |
---|
0:16:13 | we hope to have a better results with or mental |
---|
0:16:16 | and also our method as as so um |
---|
0:16:20 | um positive points |
---|
0:16:21 | for example you have in the formation computed in pixel |
---|
0:16:25 | which means that for example if i |
---|
0:16:28 | or the get in my hand |
---|
0:16:30 | it will be that it it as being in the four a i'd the background subtraction |
---|
0:16:34 | but |
---|
0:16:34 | the probability maps |
---|
0:16:36 | right i |
---|
0:16:37 | to you raise the guitar if i one for example or to do was recovery of we also |
---|
0:16:42 | a like this |
---|
0:16:44 | so |
---|
0:16:46 | i think or or middle well |
---|
0:16:48 | uh |
---|
0:16:49 | steve |
---|
0:16:53 | well last question from you again |
---|
0:16:59 | mentioned that the this can be used for video sequence |
---|
0:17:01 | have you thought about how we use the temporal information |
---|
0:17:04 | because a you and that is |
---|
0:17:05 | a frame by frame |
---|
0:17:06 | yes yes that them brought information is used |
---|
0:17:09 | uh in fig by the background subtraction |
---|
0:17:12 | okay i |
---|
0:17:13 | but my question was to think that it |
---|
0:17:15 | the uh |
---|
0:17:16 | could use the temporal information on them you |
---|
0:17:18 | successive detections |
---|
0:17:20 | and successive frames to the we prove the the result |
---|
0:17:23 | about that |
---|
0:17:24 | uh |
---|
0:17:25 | yeah |
---|
0:17:26 | if you want to |
---|
0:17:27 | can apply tracking |
---|
0:17:29 | sample |
---|
0:17:30 | and |
---|
0:17:31 | if you |
---|
0:17:31 | try and number and |
---|
0:17:33 | each |
---|
0:17:34 | the component |
---|
0:17:34 | the the for one |
---|
0:17:36 | you can |
---|
0:17:36 | uh |
---|
0:17:37 | improve the right |
---|
0:17:40 | i know if it's really did |
---|
0:17:42 | on no |
---|
0:17:43 | the that depends on the application |
---|
0:17:46 | i could for example to you of the arms |
---|
0:17:49 | and the movements of the like |
---|
0:17:50 | B |
---|
0:17:51 | one possible feature |
---|
0:17:52 | can |
---|
0:17:53 | looking |
---|
0:17:54 | okay and um |
---|
0:17:56 | just take a temporal window |
---|
0:17:59 | um |
---|
0:18:00 | just can the the what |
---|
0:18:02 | and you would have a |
---|
0:18:03 | uh |
---|
0:18:04 | three D you now we shape |
---|
0:18:06 | and then you can have that such method |
---|
0:18:09 | uh |
---|
0:18:10 | but all |
---|
0:18:11 | in the place of |
---|
0:18:13 | scraping excess |
---|
0:18:14 | we will describe folks |
---|
0:18:18 | that's all right |
---|
0:18:19 | thank you very much for all the sensors |
---|