0:00:15 | okay |
---|
0:00:17 | welcome to my presentation i will the speak about the project but it for my |
---|
0:00:22 | masters thesis up in norway in colouration would devour us all and worked on what |
---|
0:00:28 | i |
---|
0:00:30 | but the project was about applying particle swarm optimization has nothing to do with particle |
---|
0:00:37 | the filtering |
---|
0:00:39 | and two human pose tracking |
---|
0:00:43 | so the tracking process will be that you have a three D model of the |
---|
0:00:47 | human and match it's optimally to the observed image in every if a video frame |
---|
0:00:56 | and because this three D model has uh we're thirty parameters that we have to |
---|
0:01:02 | divide the optimization into two stages |
---|
0:01:05 | and in the first stage we only optimize the most important parameters of the model |
---|
0:01:11 | which are the global and position and orientation of the model and then in the |
---|
0:01:16 | second stage we use a global optimisation of the model with all with the arms |
---|
0:01:23 | and legs but we constrain the previously optimized position parameters to a smaller space so |
---|
0:01:30 | to just allow to correct small errors made in the first stage that's what we |
---|
0:01:35 | call the soft partitioning |
---|
0:01:42 | just starting point of the project was the |
---|
0:01:46 | the lee walk dataset that was uh put out by balan et al two thousand |
---|
0:01:51 | and five along with their paper that describe the tracking an algorithm based on the |
---|
0:01:58 | annealed particle filter |
---|
0:02:02 | and |
---|
0:02:03 | this data set that includes a gray scale video from four different use of a |
---|
0:02:08 | single subject walking in a circle and also uh foreground-background segmentation |
---|
0:02:16 | that's used for the fitness function |
---|
0:02:20 | they also published their complete algorithm in matlab and their body model and you also |
---|
0:02:26 | use that in the modified to |
---|
0:02:33 | so the goal will be to track this person with a three D model throughout |
---|
0:02:38 | the whole sequence |
---|
0:02:40 | you see that the track model in colours and also if you look closely the |
---|
0:02:47 | ground truth model in black and white this ground truth model was obtained by balan |
---|
0:02:53 | et al using a commercially available and motion capture system S is used for uh |
---|
0:02:59 | the movies and such |
---|
0:03:06 | the actual problem our algorithm is dealing with this is pose tracking |
---|
0:03:11 | yeah this it relies on the first initialization of the of the of the model |
---|
0:03:16 | in the first frame and then tracks the model and does not to any recognition |
---|
0:03:22 | of actions or something that would be the application of the algorithm for example in |
---|
0:03:27 | surveillance videos where you could classify what people are doing but it's just dealing with |
---|
0:03:34 | the tracking |
---|
0:03:39 | the challenges are uh the main challenges are mostly ambiguities from the three D two |
---|
0:03:44 | D mapping for example if you just look at the silhouette this silhouette and the |
---|
0:03:49 | silhouette and look exactly the same |
---|
0:03:53 | but you can overcome this by using multiple camera views and so we use the |
---|
0:03:57 | four camera views of this dataset and the most important problem is the high dimensionality |
---|
0:04:04 | of the body model |
---|
0:04:07 | we use a body model with the kinematic tree where the over thirty degrees of |
---|
0:04:14 | freedom |
---|
0:04:15 | to model the kinematic structure of human |
---|
0:04:19 | and to model the shape we use a simple model with ten truncated cones |
---|
0:04:26 | it it's very coarse model about the se |
---|
0:04:30 | so yeah approximates human shape |
---|
0:04:37 | so to match the model to the observation in each frame you need to have |
---|
0:04:41 | to define a fitness function and we use a similar one S use that in |
---|
0:04:46 | the two thousand ten publication of signal than black |
---|
0:04:51 | with the two parts first part silhouette fitness very take the foreground-background segmentation and match |
---|
0:04:59 | it to the model um silhouette |
---|
0:05:03 | important here is that the it has to be bidirectional and what is meant by |
---|
0:05:09 | this it has to um look how much of the model is inside the observation |
---|
0:05:16 | and how much of the observation is inside the model |
---|
0:05:19 | because you have to paralyse that the |
---|
0:05:23 | yeah but |
---|
0:05:25 | the let the like in rats is outside the model but the model is almost |
---|
0:05:28 | completely inside the observation |
---|
0:05:32 | so this is important |
---|
0:05:39 | and then the second part of the fitness function is an edge fitness function |
---|
0:05:46 | um humans produce strong edges in the images and so they are easy to get |
---|
0:05:54 | but we divide it's the edge fitness function and for the two steps of our |
---|
0:05:59 | optimisation in the first that we just look at the of the position that the |
---|
0:06:05 | course position of the person without looking at the arms and legs and so we |
---|
0:06:08 | only use torso edges |
---|
0:06:11 | and in the second optimisation stage we look at all that just with that lacks |
---|
0:06:15 | a little limbs |
---|
0:06:24 | this is just an overview of the fitness computation |
---|
0:06:29 | you gets the observed image the projected candidate pose |
---|
0:06:34 | and then you produce the silhouettes and the edges of both and we additionally mask |
---|
0:06:41 | the edge picture with the edge image with the silhouette to get rid of spores |
---|
0:06:48 | edges in the background |
---|
0:06:51 | and then we match both fitness also images |
---|
0:06:55 | and the silhouette fitness and the edge fitness are normalized separately and summed up to |
---|
0:07:00 | form a final fitness value that quantified how well a candidate pose matches an image |
---|
0:07:11 | in comes the optimization with soft partitioning as a set |
---|
0:07:16 | first in image data you have the initialization that is the previous the model from |
---|
0:07:24 | the previous frame |
---|
0:07:26 | and then you get the image here you see the foreground-background segmentation of the next |
---|
0:07:32 | frame |
---|
0:07:33 | and in it the result of the first optimisation stage is you shift the model |
---|
0:07:39 | without changing the arms or legs you shifted to the new position of the person |
---|
0:07:46 | and in the second stage in image see we adapt the position of arms and |
---|
0:07:51 | legs in a global optimisation |
---|
0:07:55 | but all parameters are allowed to change even the position parameters have been optimized previously |
---|
0:08:01 | but constraint to narrower range |
---|
0:08:11 | this is a to illustrate and to contrast the soft partitioning concept here will be |
---|
0:08:17 | a heart partitioning with two variables |
---|
0:08:21 | in two steps so in the first step you optimise |
---|
0:08:25 | the first |
---|
0:08:28 | parameter X one keep it fixed and in the second stage optimize parameter to |
---|
0:08:35 | and you see the optimum would be here you can't get there because you are |
---|
0:08:39 | not allowed to correct errors made in the first stage |
---|
0:08:43 | so we allow small variations |
---|
0:08:47 | of the previously optimized parameter |
---|
0:08:50 | to open up the search space little and correct errors we made so that we |
---|
0:08:57 | saw in experiments that if you don't do that dance you can also see it |
---|
0:09:01 | in the in the literature that's you uh and get thrift in your model if |
---|
0:09:08 | you make a heart partitioning in such a way |
---|
0:09:16 | then to evaluate our algorithm we use the standard error measure uh proposed by balan |
---|
0:09:23 | et al |
---|
0:09:25 | that is just the mean distance of fifteen marker joints |
---|
0:09:29 | the between the ground truth model and the track model |
---|
0:09:38 | in this prophecy the results of uh five tracking runs and the mean error for |
---|
0:09:45 | every frame |
---|
0:09:47 | for all our algorithm in black and the apf in green apf is the annealed |
---|
0:09:54 | particle filter that was implemented by balan et al and proposed as a benchmark algorithm |
---|
0:10:01 | both algorithms use the same amount the fitness a evaluation but this that the time |
---|
0:10:07 | consuming part of the algorithm and exactly the same fitness functions |
---|
0:10:13 | and you can see this that our algorithm performs uh you better than apf |
---|
0:10:20 | the |
---|
0:10:22 | this peak |
---|
0:10:24 | is cost you can see it in the beta later by lost like uh what's |
---|
0:10:29 | dislike and theory acquired in further a frames of the video so it's uh |
---|
0:10:35 | quite robust |
---|
0:10:39 | this is the video to this to the previous graph shows one tracking wrong |
---|
0:10:49 | again you see the ground truth and black and the tracking results in colour |
---|
0:10:54 | and it loses uh and the arm frequently and the lack frequently but this is |
---|
0:11:00 | that's twenty frames per second and the original dataset a sixty frames so it's easier |
---|
0:11:07 | to track at higher frame rates because of course you have a smaller distances between |
---|
0:11:14 | the your poses |
---|
0:11:17 | between the frames |
---|
0:11:19 | so |
---|
0:11:21 | trucks better at sixty frames |
---|
0:11:35 | so in conclusion um particle swarm optimization can be applied successfully to pose tracking and |
---|
0:11:43 | B |
---|
0:11:45 | it can even perform better than the annealed particle filter without the old uh the |
---|
0:11:52 | probably probabilistic uh |
---|
0:11:56 | overhead |
---|
0:11:58 | and you have to do something to overcome the high dimensionality problem of such a |
---|
0:12:04 | body model and the soft partitioning approach them works |
---|
0:12:10 | and in our eyes works better than the heart partitioning because heart partitioning approaches and |
---|
0:12:17 | imply illustration |
---|
0:12:21 | and of course the body model for future approaches uh should be a little more |
---|
0:12:25 | detail because for example you count model uh aren't twists and such and this uh |
---|
0:12:32 | give some problems |
---|
0:12:36 | so i wanna thank uh and university for the funding and the arousal and book |
---|
0:12:42 | sampling for the good colouration and their help |
---|
0:12:48 | thank you for your attention and uh all will be happy to answer questions |
---|
0:13:07 | at the model has constraints so only natural bindings of the joints are allowed |
---|
0:13:15 | yep |
---|
0:13:25 | yes |
---|
0:13:34 | yes |
---|
0:13:40 | uh that's uh an empirical value we just the |
---|
0:13:44 | allow only one kinds of the variation for uh in the second stage for the |
---|
0:13:48 | first problem |
---|
0:13:54 | of course the optimal setting will be different probably about the i mean it's a |
---|
0:13:59 | general principle that you can get a coarse alignment of the body in the first |
---|
0:14:04 | step and then |
---|
0:14:06 | the just the arm positioning in the second step |
---|
0:14:19 | uh just ground truth model so we didn't think about initialization |
---|
0:14:26 | you could use any human detector and try to initialize it with it |
---|