the morning everybody a my name is get down to get the you know from
the universe the straight fine glasgow and you two percent of work title dsp embedded
smart surveillance sensor we propose swad based tracker uh if i speak too fast just
um to slow down and the other cultures of the paper a novel so you
under from texas instruments and prefer subjects hologram from the university of stuff like so
this is the outline of my presentation first of all i would give a brief
introduction to set up the scene and then i was state at an object it's
oral work and after that i will show an overview of the entire system after
that i will talk about in details about the been only dates within the system
and then i show some results uh to become our work after that i was
that the convolution our show some future work
so we just surveillance is the monitoring we need a we commoners two video cameras
this is convenient because we can use multiple common us to surveillance and Y data
and weighted wireless person it's only here we have to be able in the looking
at fifteen be used at the same time
and i think is that we can analyze we can store to be used for
future access
but there's a problem so when we have too many we use and the personally
sleeping so what we're gonna do this case is a suspicious individuals walking around but
surveillance miscellaneous is sleeping
so the problem is the level of tension the reaction time and the crime prevention
so we don't want to use the surveillance this footage for a process for a
trial what we want to the contraction straightaway
so we analytics is the semantic analysis of video data to computer systems uh using
image can be the processing techniques in this case we talk about smart surveillance "'cause"
we have different divvied you in every applies my algorithms to analyze the speed you
and when we have we do not be the ninety six um we want to
achieve is to have smart so smart sensors so we have the beginning it's embedded
on uh processes which are then attached to the commerce so we can create smart
units and we can deploy the intelligence and the edge of the network
so when we have multiple us a bit smart surveillance sensor like in this case
we can severely um we can so be a whole building um you real time
we don't need to send on to be just change the central a station but
we didn't we just need to send all their anyone information for example an object
in this but also the person has to be tracked
so the aim of this work is to create a smart surveillance sensor for tracking
for automatic tracking using the ptz camera type it is a common is a camera
can be combined to into one uh one of our some object
and the object is to implement this month algorithms on a dsp board to have
um automatically problematical controls the ptz from the board in to be able to activate
and deactivate the tracking algorithm from remote
so this is an overview of the system in the sense that you can see
uh the and em which is the dsp board and the ptz which is our
camera in when they are connected together we can process the we just streams from
the camera on the dsp in this case we can talk about smart so smart
sensor
so uh for either because a texas instruments dm six forty seven evm which is
a fixed point dsp um then there's also i than a connection between the video
in the because the kalman is and i think uh which can bomb gender and
sixty degrees in two hundred and then the need to do this
um the software is implemented in C uh space with the minimum pixel or system
uh we also have it is if we sell where on the evm so we
can send commands to activate and deactivate the algorithm rerun single time and more than
twenty five frames per second uh on the ptz we have in http server but
this property so we don't do anything we just send commands to try to be
is that
and this is the bead analytics uh basically uh we acquired we just three we
decimate and then we didn't the leave and estimate so we uh have smaller frames
the process and then we uh we apply our tracking algorithm um the result of
this uh tracking algorithm is used to control the same and then uh we can
send commands to the camera we can form of the target syllables the C
so this is a yeah why the video stream which is that it be easily
by C D C R the we didn't lev and we discard the chrominance components
of retain only the luminance component uh so the algorithms can work on a actually
works and gray scale images
and then we decimate so we have small frames
the tracking algorithm is based on them when matching and then we use and a
sum of weighted absolute differences which is similar to slot is in the C and
then we have another team uh rather uh updated them but uh that you really
a bit more details about this algorithm are given in this paper
so starting from the from a frame we have a region of interest ri and
we uh try uh to find the best match for this template ti
it's easy here we have the region of interest you have the time but we
try to find the best match the in this region of interest so this is
the basic concept of them but much
the region of interest is defined as the surrounding area around the best match so
in that case we have ri plus one in this is uh alright initial region
of interest
so to minimize uh to find this mismatch we minimize the swad coefficient is you
can see here in this one coefficient basically say sum of weighted absolute difference but
these um the weighting getting them
it's a gaussian gonna this is because we want to give more weight to those
to the peaks in this in the center of the target so in this course
and uh peak so that the edge of the of the template i belong to
an occluding object or in the background
so uh up to update the template once a fun the best match which are
we compute the template for the next frame so we start from the poor and
then but uh we had the best match and then we fuse them together using
uh this information which is basically an iir filter and i'll by submitting factor
so in this way we can incorporate changes to the from the target in the
time but getting on for the tracking in the next frames
so once we have the position of the target we can control the ptz which
is the common to and we do this to http requests a single H beta
voice to the server on the comet or you can see a common commands for
the ptz so basically we have uh maybe it is a common to the user
name and the common the common the six is the see this is six bytes
send um to the um to the camera in this is done from the dsp
on the board to the car but also the internet at work so to want
to control actually once all the ptz uh in save it to move up or
about basically we detect if uh the ten the best match is in the stop
originally that it up originally done that region basically the idea is if the best
match is and near the edge we is likely that the target is going out
of the field of view so we send the commander we don't the ptz either
to give up or down left or right so in this way we are able
to control the ptz import of the target
so these same for frames from the memory of the dsp Z you can see
the black box is the region of interest but at box is the target
uh is the best match and on top left hand side you can see that
there but for the current frame is you can see the target is moving
and at the top you can see the template is you know the of any
changes so we can always find the best match
and for is also use a good as imprecision basically we have a position given
by the target and the position uh you from the roundabout and we compute the
cuda seriously involved in the precision is standard deviation
at we apply the algorithm um with matlab implementations uh before sequences that do that
for a sequence you can see that um basically all the track system for the
target box the start and uh and cc the ncc is the normalized cross-correlation uh
they perform worse because um they are formed by the peaks as a in the
um i see that the edges of the time but as you can see the
meat the middle this is fine and that's when the person in the video um
uses an already space
a in the pants the doesn't in six you can see the normalized cross-correlation the
side the average so it means that they lose the target while the mean shift
and this one can still for the target
in these are the two sequences again we see that the normalized cross-correlation in the
side the first uh the first graph the average so again that was the target
one in this case we have the last example we have a lot sizzled we
have that the mean shift just a single target so basically the slides uh tracker
the swad based tracker perform but performs better than the sad ncc and the ms
in the sequences
in here we there are some but somebody got about as we can see that
the accuracy could be that this anybody is always lower than all the general that
are is not the sequences lots of the precision usually nor so this proves once
again that we have good performance without tracking
for execution time uh so this algorithm is implemented on dsp on the board and
them in this one block with the takes seven milliseconds that we didn't all the
fifty milliseconds or frames so basically is less than forty miliseconds and is much more
than twenty five from the segment so we had she our name which is real
time this efficient in this is done through intrinsics are uh C functions that implement
it uh that are implemented for the a particle architecture in this case we have
the dsp fixed point architecture so we use that meant for the subsets for the
ball before which work on groups of four bytes or for pizza so basically be
good um you one cycle one and um swipe matching block we compute we analyze
for peaks also be basically got train cut down the competition by four
yes an example here the non optimize mation of the same algorithm takes sixty three
male milliseconds we just nine times more
so this is a working example our system you can see that the board that
we don't the ptz the bit that we just came from the ptz goes into
the board the board analyses to be disagreement right before the target
this is that we do
so this is taken from a remote from a display the remote viewer
it's you can see that it is a common is moving to follow the target
as the target moves left to right
your clothes are far away from the camera
every as the label well the camera the algorithm is still able to track the
target control the be the set so we can always of the target in the
field of view
so you conclusion uh i presented in a dsp embedded smart surveillance sensor uh using
the ptz camera to uh for the target as he tried to move out of
the field of view of the dsp on the dm six forty seventy six point
uh and the target we use is the swad based tracker the results show high
accuracy um accuracy and precision under partial occlusion
so for future work we will try to think include also complete occlusion handling
uh C
take upon this paper is to avoid the just published so here you have a
big deal with the swad based tracker we don't occlusion handling you can see that
the tracker loses the target is it becomes occluded while we didn't you originally technique
really able to recover the target this it comes out of depression
so for future work we will try to implement also this feature on the board
so this concludes my presentation thank you for listening in a few not constantly have
a test i
right
uh_huh
at the moment and we don't use the so uh feature of the calmer so
yes when the target most close to the camera the at the target the size
of the target sure larger screen and just not we don't do it for simplicity
but as you can this is solved
basically what we updated i
see here though they're target the smaller so we can uh interface
and then to close it closer to the camera so we can uh incorporate the
changes of the target in of them but at the moment we don't uh i
just a precise the target that's another thing to do in the future
okay
right
right there this target is not for face tracking or any particular objects that is
it's a target tracking so it works is always a target they say and the
obvious a good texture
okay so you can discriminate target from the from the background so here we start
from the face as an example and then he moves exactly closer to the calmer
obviously the face is the big for the template and gets my fading mimo my
neck
so by the generous for any object is not only for france
yeah
well
a
right
but you mean for the for the future work on mention yes okay and in
that in this uh in this paper to enter with a complete occlusion basically what
we do is we don't update when the target was under occlusion with an update
the whole template at the same time but same weight but we have different weights
for all the pixels in the time it so when you go center occlusion we
don't update decide the one of the possible with it only this one
and eventually when you see it yep it only few pictures on the site the
means of the target is going to be able to discern so in the next
three next few frames that is occluded
in that case you don't update anymore and you say the target is occluded and
then when it comes out is you have not updated decide the occluded one when
it comes out on the occlusion the target is the template is preserved so again
you can find the best match for your started
yeah
a yeah it can be adapted for
yeah well this is an usual in surveillance you have three components the detection algorithm
the tracking algorithm then the position or something that this is only the talking a
good
for an to select the target you can either we manually we can use an
automatic algorithm
usually in surveillance systems you have a person driving the ptz
trying to find something and then the rest and we're not to be the set
on the target
and then why this algorithm to talk
she
right
okay and the template
this thing about it depends on the landing factor
yeah
in this case as we process but more than twenty five from the second we
give important way to the previous uh
to the previous template into the best match but you can choose the brain in
a real application so to who you want you want to give more weight so
if you wanna have a um rgc
you want to preserve just ten but then you would give more weight to your
previous time but
okay if you want to a docking very fast and you will give more weight
to the best match in the case for example you give a divorce a divorce
and seventy percent of the best match so you're able to incorporate the changes in
the ten but for