Speech Transcript - Particle Swarm Optimization with Soft Search Space Partitioning for Video-Based Markerless Pose Tracking

okay

welcome to my presentation i will the speak about the project but it for my

masters thesis up in norway in colouration would devour us all and worked on what

but the project was about applying particle swarm optimization has nothing to do with particle

the filtering

and two human pose tracking

so the tracking process will be that you have a three D model of the

human and match it's optimally to the observed image in every if a video frame

and because this three D model has uh we're thirty parameters that we have to

divide the optimization into two stages

and in the first stage we only optimize the most important parameters of the model

which are the global and position and orientation of the model and then in the

second stage we use a global optimisation of the model with all with the arms

and legs but we constrain the previously optimized position parameters to a smaller space so

to just allow to correct small errors made in the first stage that's what we

call the soft partitioning

just starting point of the project was the

the lee walk dataset that was uh put out by balan et al two thousand

and five along with their paper that describe the tracking an algorithm based on the

annealed particle filter

and

this data set that includes a gray scale video from four different use of a

single subject walking in a circle and also uh foreground-background segmentation

that's used for the fitness function

they also published their complete algorithm in matlab and their body model and you also

use that in the modified to

so the goal will be to track this person with a three D model throughout

the whole sequence

you see that the track model in colours and also if you look closely the

ground truth model in black and white this ground truth model was obtained by balan

et al using a commercially available and motion capture system S is used for uh

the movies and such

the actual problem our algorithm is dealing with this is pose tracking

yeah this it relies on the first initialization of the of the of the model

in the first frame and then tracks the model and does not to any recognition

of actions or something that would be the application of the algorithm for example in

surveillance videos where you could classify what people are doing but it's just dealing with

the tracking

the challenges are uh the main challenges are mostly ambiguities from the three D two

D mapping for example if you just look at the silhouette this silhouette and the

silhouette and look exactly the same

but you can overcome this by using multiple camera views and so we use the

four camera views of this dataset and the most important problem is the high dimensionality

of the body model

we use a body model with the kinematic tree where the over thirty degrees of

freedom

to model the kinematic structure of human

and to model the shape we use a simple model with ten truncated cones

it it's very coarse model about the se

so yeah approximates human shape

so to match the model to the observation in each frame you need to have

to define a fitness function and we use a similar one S use that in

the two thousand ten publication of signal than black

with the two parts first part silhouette fitness very take the foreground-background segmentation and match

it to the model um silhouette

important here is that the it has to be bidirectional and what is meant by

this it has to um look how much of the model is inside the observation

and how much of the observation is inside the model

because you have to paralyse that the

yeah but

the let the like in rats is outside the model but the model is almost

completely inside the observation

so this is important

and then the second part of the fitness function is an edge fitness function

um humans produce strong edges in the images and so they are easy to get

but we divide it's the edge fitness function and for the two steps of our

optimisation in the first that we just look at the of the position that the

course position of the person without looking at the arms and legs and so we

only use torso edges

and in the second optimisation stage we look at all that just with that lacks

a little limbs

this is just an overview of the fitness computation

you gets the observed image the projected candidate pose

and then you produce the silhouettes and the edges of both and we additionally mask

the edge picture with the edge image with the silhouette to get rid of spores

edges in the background

and then we match both fitness also images

and the silhouette fitness and the edge fitness are normalized separately and summed up to

form a final fitness value that quantified how well a candidate pose matches an image

in comes the optimization with soft partitioning as a set

first in image data you have the initialization that is the previous the model from

the previous frame

and then you get the image here you see the foreground-background segmentation of the next

frame

and in it the result of the first optimisation stage is you shift the model

without changing the arms or legs you shifted to the new position of the person

and in the second stage in image see we adapt the position of arms and

legs in a global optimisation

but all parameters are allowed to change even the position parameters have been optimized previously

but constraint to narrower range

this is a to illustrate and to contrast the soft partitioning concept here will be

a heart partitioning with two variables

in two steps so in the first step you optimise

the first

parameter X one keep it fixed and in the second stage optimize parameter to

and you see the optimum would be here you can't get there because you are

not allowed to correct errors made in the first stage

so we allow small variations

of the previously optimized parameter

to open up the search space little and correct errors we made so that we

saw in experiments that if you don't do that dance you can also see it

in the in the literature that's you uh and get thrift in your model if

you make a heart partitioning in such a way

then to evaluate our algorithm we use the standard error measure uh proposed by balan

et al

that is just the mean distance of fifteen marker joints

the between the ground truth model and the track model

in this prophecy the results of uh five tracking runs and the mean error for

every frame

for all our algorithm in black and the apf in green apf is the annealed

particle filter that was implemented by balan et al and proposed as a benchmark algorithm

both algorithms use the same amount the fitness a evaluation but this that the time

consuming part of the algorithm and exactly the same fitness functions

and you can see this that our algorithm performs uh you better than apf

the

this peak

is cost you can see it in the beta later by lost like uh what's

dislike and theory acquired in further a frames of the video so it's uh

quite robust

this is the video to this to the previous graph shows one tracking wrong

again you see the ground truth and black and the tracking results in colour

and it loses uh and the arm frequently and the lack frequently but this is

that's twenty frames per second and the original dataset a sixty frames so it's easier

to track at higher frame rates because of course you have a smaller distances between

the your poses

between the frames

trucks better at sixty frames

so in conclusion um particle swarm optimization can be applied successfully to pose tracking and

it can even perform better than the annealed particle filter without the old uh the

probably probabilistic uh

overhead

and you have to do something to overcome the high dimensionality problem of such a

body model and the soft partitioning approach them works

and in our eyes works better than the heart partitioning because heart partitioning approaches and

imply illustration

and of course the body model for future approaches uh should be a little more

detail because for example you count model uh aren't twists and such and this uh

give some problems

so i wanna thank uh and university for the funding and the arousal and book

sampling for the good colouration and their help

thank you for your attention and uh all will be happy to answer questions

at the model has constraints so only natural bindings of the joints are allowed

yep

yes

uh that's uh an empirical value we just the

allow only one kinds of the variation for uh in the second stage for the

first problem

of course the optimal setting will be different probably about the i mean it's a

general principle that you can get a coarse alignment of the body in the first

step and then

the just the arm positioning in the second step

uh just ground truth model so we didn't think about initialization

you could use any human detector and try to initialize it with it

Particle Swarm Optimization with Soft Search Space Partitioning for Video-Based Markerless Pose Tracking

Object Tracking and Identification

Patrick Fleischmann, Ivar Austvoll, Bogdan Kwolek