okay
welcome to my presentation i will the speak about the project but it for my
masters thesis up in norway in colouration would devour us all and worked on what
i
but the project was about applying particle swarm optimization has nothing to do with particle
the filtering
and two human pose tracking
so the tracking process will be that you have a three D model of the
human and match it's optimally to the observed image in every if a video frame
and because this three D model has uh we're thirty parameters that we have to
divide the optimization into two stages
and in the first stage we only optimize the most important parameters of the model
which are the global and position and orientation of the model and then in the
second stage we use a global optimisation of the model with all with the arms
and legs but we constrain the previously optimized position parameters to a smaller space so
to just allow to correct small errors made in the first stage that's what we
call the soft partitioning
just starting point of the project was the
the lee walk dataset that was uh put out by balan et al two thousand
and five along with their paper that describe the tracking an algorithm based on the
annealed particle filter
and
this data set that includes a gray scale video from four different use of a
single subject walking in a circle and also uh foreground-background segmentation
that's used for the fitness function
they also published their complete algorithm in matlab and their body model and you also
use that in the modified to
so the goal will be to track this person with a three D model throughout
the whole sequence
you see that the track model in colours and also if you look closely the
ground truth model in black and white this ground truth model was obtained by balan
et al using a commercially available and motion capture system S is used for uh
the movies and such
the actual problem our algorithm is dealing with this is pose tracking
yeah this it relies on the first initialization of the of the of the model
in the first frame and then tracks the model and does not to any recognition
of actions or something that would be the application of the algorithm for example in
surveillance videos where you could classify what people are doing but it's just dealing with
the tracking
the challenges are uh the main challenges are mostly ambiguities from the three D two
D mapping for example if you just look at the silhouette this silhouette and the
silhouette and look exactly the same
but you can overcome this by using multiple camera views and so we use the
four camera views of this dataset and the most important problem is the high dimensionality
of the body model
we use a body model with the kinematic tree where the over thirty degrees of
freedom
to model the kinematic structure of human
and to model the shape we use a simple model with ten truncated cones
it it's very coarse model about the se
so yeah approximates human shape
so to match the model to the observation in each frame you need to have
to define a fitness function and we use a similar one S use that in
the two thousand ten publication of signal than black
with the two parts first part silhouette fitness very take the foreground-background segmentation and match
it to the model um silhouette
important here is that the it has to be bidirectional and what is meant by
this it has to um look how much of the model is inside the observation
and how much of the observation is inside the model
because you have to paralyse that the
yeah but
the let the like in rats is outside the model but the model is almost
completely inside the observation
so this is important
and then the second part of the fitness function is an edge fitness function
um humans produce strong edges in the images and so they are easy to get
but we divide it's the edge fitness function and for the two steps of our
optimisation in the first that we just look at the of the position that the
course position of the person without looking at the arms and legs and so we
only use torso edges
and in the second optimisation stage we look at all that just with that lacks
a little limbs
this is just an overview of the fitness computation
you gets the observed image the projected candidate pose
and then you produce the silhouettes and the edges of both and we additionally mask
the edge picture with the edge image with the silhouette to get rid of spores
edges in the background
and then we match both fitness also images
and the silhouette fitness and the edge fitness are normalized separately and summed up to
form a final fitness value that quantified how well a candidate pose matches an image
in comes the optimization with soft partitioning as a set
first in image data you have the initialization that is the previous the model from
the previous frame
and then you get the image here you see the foreground-background segmentation of the next
frame
and in it the result of the first optimisation stage is you shift the model
without changing the arms or legs you shifted to the new position of the person
and in the second stage in image see we adapt the position of arms and
legs in a global optimisation
but all parameters are allowed to change even the position parameters have been optimized previously
but constraint to narrower range
this is a to illustrate and to contrast the soft partitioning concept here will be
a heart partitioning with two variables
in two steps so in the first step you optimise
the first
parameter X one keep it fixed and in the second stage optimize parameter to
and you see the optimum would be here you can't get there because you are
not allowed to correct errors made in the first stage
so we allow small variations
of the previously optimized parameter
to open up the search space little and correct errors we made so that we
saw in experiments that if you don't do that dance you can also see it
in the in the literature that's you uh and get thrift in your model if
you make a heart partitioning in such a way
then to evaluate our algorithm we use the standard error measure uh proposed by balan
et al
that is just the mean distance of fifteen marker joints
the between the ground truth model and the track model
in this prophecy the results of uh five tracking runs and the mean error for
every frame
for all our algorithm in black and the apf in green apf is the annealed
particle filter that was implemented by balan et al and proposed as a benchmark algorithm
both algorithms use the same amount the fitness a evaluation but this that the time
consuming part of the algorithm and exactly the same fitness functions
and you can see this that our algorithm performs uh you better than apf
the
this peak
is cost you can see it in the beta later by lost like uh what's
dislike and theory acquired in further a frames of the video so it's uh
quite robust
this is the video to this to the previous graph shows one tracking wrong
again you see the ground truth and black and the tracking results in colour
and it loses uh and the arm frequently and the lack frequently but this is
that's twenty frames per second and the original dataset a sixty frames so it's easier
to track at higher frame rates because of course you have a smaller distances between
the your poses
between the frames
so
trucks better at sixty frames
so in conclusion um particle swarm optimization can be applied successfully to pose tracking and
B
it can even perform better than the annealed particle filter without the old uh the
probably probabilistic uh
overhead
and you have to do something to overcome the high dimensionality problem of such a
body model and the soft partitioning approach them works
and in our eyes works better than the heart partitioning because heart partitioning approaches and
imply illustration
and of course the body model for future approaches uh should be a little more
detail because for example you count model uh aren't twists and such and this uh
give some problems
so i wanna thank uh and university for the funding and the arousal and book
sampling for the good colouration and their help
thank you for your attention and uh all will be happy to answer questions
at the model has constraints so only natural bindings of the joints are allowed
yep
yes
yes
uh that's uh an empirical value we just the
allow only one kinds of the variation for uh in the second stage for the
first problem
of course the optimal setting will be different probably about the i mean it's a
general principle that you can get a coarse alignment of the body in the first
step and then
the just the arm positioning in the second step
uh just ground truth model so we didn't think about initialization
you could use any human detector and try to initialize it with it