Speech Transcript - DSP Embedded Smart Surveillance Sensor with Robust SWAD-based Tracker

the morning everybody a my name is get down to get the you know from

the universe the straight fine glasgow and you two percent of work title dsp embedded

smart surveillance sensor we propose swad based tracker uh if i speak too fast just

um to slow down and the other cultures of the paper a novel so you

under from texas instruments and prefer subjects hologram from the university of stuff like so

this is the outline of my presentation first of all i would give a brief

introduction to set up the scene and then i was state at an object it's

oral work and after that i will show an overview of the entire system after

that i will talk about in details about the been only dates within the system

and then i show some results uh to become our work after that i was

that the convolution our show some future work

so we just surveillance is the monitoring we need a we commoners two video cameras

this is convenient because we can use multiple common us to surveillance and Y data

and weighted wireless person it's only here we have to be able in the looking

at fifteen be used at the same time

and i think is that we can analyze we can store to be used for

future access

but there's a problem so when we have too many we use and the personally

sleeping so what we're gonna do this case is a suspicious individuals walking around but

surveillance miscellaneous is sleeping

so the problem is the level of tension the reaction time and the crime prevention

so we don't want to use the surveillance this footage for a process for a

trial what we want to the contraction straightaway

so we analytics is the semantic analysis of video data to computer systems uh using

image can be the processing techniques in this case we talk about smart surveillance "'cause"

we have different divvied you in every applies my algorithms to analyze the speed you

and when we have we do not be the ninety six um we want to

achieve is to have smart so smart sensors so we have the beginning it's embedded

on uh processes which are then attached to the commerce so we can create smart

units and we can deploy the intelligence and the edge of the network

so when we have multiple us a bit smart surveillance sensor like in this case

we can severely um we can so be a whole building um you real time

we don't need to send on to be just change the central a station but

we didn't we just need to send all their anyone information for example an object

in this but also the person has to be tracked

so the aim of this work is to create a smart surveillance sensor for tracking

for automatic tracking using the ptz camera type it is a common is a camera

can be combined to into one uh one of our some object

and the object is to implement this month algorithms on a dsp board to have

um automatically problematical controls the ptz from the board in to be able to activate

and deactivate the tracking algorithm from remote

so this is an overview of the system in the sense that you can see

uh the and em which is the dsp board and the ptz which is our

camera in when they are connected together we can process the we just streams from

the camera on the dsp in this case we can talk about smart so smart

sensor

so uh for either because a texas instruments dm six forty seven evm which is

a fixed point dsp um then there's also i than a connection between the video

in the because the kalman is and i think uh which can bomb gender and

sixty degrees in two hundred and then the need to do this

um the software is implemented in C uh space with the minimum pixel or system

uh we also have it is if we sell where on the evm so we

can send commands to activate and deactivate the algorithm rerun single time and more than

twenty five frames per second uh on the ptz we have in http server but

this property so we don't do anything we just send commands to try to be

is that

and this is the bead analytics uh basically uh we acquired we just three we

decimate and then we didn't the leave and estimate so we uh have smaller frames

the process and then we uh we apply our tracking algorithm um the result of

this uh tracking algorithm is used to control the same and then uh we can

send commands to the camera we can form of the target syllables the C

so this is a yeah why the video stream which is that it be easily

by C D C R the we didn't lev and we discard the chrominance components

of retain only the luminance component uh so the algorithms can work on a actually

works and gray scale images

and then we decimate so we have small frames

the tracking algorithm is based on them when matching and then we use and a

sum of weighted absolute differences which is similar to slot is in the C and

then we have another team uh rather uh updated them but uh that you really

a bit more details about this algorithm are given in this paper

so starting from the from a frame we have a region of interest ri and

we uh try uh to find the best match for this template ti

it's easy here we have the region of interest you have the time but we

try to find the best match the in this region of interest so this is

the basic concept of them but much

the region of interest is defined as the surrounding area around the best match so

in that case we have ri plus one in this is uh alright initial region

of interest

so to minimize uh to find this mismatch we minimize the swad coefficient is you

can see here in this one coefficient basically say sum of weighted absolute difference but

these um the weighting getting them

it's a gaussian gonna this is because we want to give more weight to those

to the peaks in this in the center of the target so in this course

and uh peak so that the edge of the of the template i belong to

an occluding object or in the background

so uh up to update the template once a fun the best match which are

we compute the template for the next frame so we start from the poor and

then but uh we had the best match and then we fuse them together using

uh this information which is basically an iir filter and i'll by submitting factor

so in this way we can incorporate changes to the from the target in the

time but getting on for the tracking in the next frames

so once we have the position of the target we can control the ptz which

is the common to and we do this to http requests a single H beta

voice to the server on the comet or you can see a common commands for

the ptz so basically we have uh maybe it is a common to the user

name and the common the common the six is the see this is six bytes

send um to the um to the camera in this is done from the dsp

on the board to the car but also the internet at work so to want

to control actually once all the ptz uh in save it to move up or

about basically we detect if uh the ten the best match is in the stop

originally that it up originally done that region basically the idea is if the best

match is and near the edge we is likely that the target is going out

of the field of view so we send the commander we don't the ptz either

to give up or down left or right so in this way we are able

to control the ptz import of the target

so these same for frames from the memory of the dsp Z you can see

the black box is the region of interest but at box is the target

uh is the best match and on top left hand side you can see that

there but for the current frame is you can see the target is moving

and at the top you can see the template is you know the of any

changes so we can always find the best match

and for is also use a good as imprecision basically we have a position given

by the target and the position uh you from the roundabout and we compute the

cuda seriously involved in the precision is standard deviation

at we apply the algorithm um with matlab implementations uh before sequences that do that

for a sequence you can see that um basically all the track system for the

target box the start and uh and cc the ncc is the normalized cross-correlation uh

they perform worse because um they are formed by the peaks as a in the

um i see that the edges of the time but as you can see the

meat the middle this is fine and that's when the person in the video um

uses an already space

a in the pants the doesn't in six you can see the normalized cross-correlation the

side the average so it means that they lose the target while the mean shift

and this one can still for the target

in these are the two sequences again we see that the normalized cross-correlation in the

side the first uh the first graph the average so again that was the target

one in this case we have the last example we have a lot sizzled we

have that the mean shift just a single target so basically the slides uh tracker

the swad based tracker perform but performs better than the sad ncc and the ms

in the sequences

in here we there are some but somebody got about as we can see that

the accuracy could be that this anybody is always lower than all the general that

are is not the sequences lots of the precision usually nor so this proves once

again that we have good performance without tracking

for execution time uh so this algorithm is implemented on dsp on the board and

them in this one block with the takes seven milliseconds that we didn't all the

fifty milliseconds or frames so basically is less than forty miliseconds and is much more

than twenty five from the segment so we had she our name which is real

time this efficient in this is done through intrinsics are uh C functions that implement

it uh that are implemented for the a particle architecture in this case we have

the dsp fixed point architecture so we use that meant for the subsets for the

ball before which work on groups of four bytes or for pizza so basically be

good um you one cycle one and um swipe matching block we compute we analyze

for peaks also be basically got train cut down the competition by four

yes an example here the non optimize mation of the same algorithm takes sixty three

male milliseconds we just nine times more

so this is a working example our system you can see that the board that

we don't the ptz the bit that we just came from the ptz goes into

the board the board analyses to be disagreement right before the target

this is that we do

so this is taken from a remote from a display the remote viewer

it's you can see that it is a common is moving to follow the target

as the target moves left to right

your clothes are far away from the camera

every as the label well the camera the algorithm is still able to track the

target control the be the set so we can always of the target in the

field of view

so you conclusion uh i presented in a dsp embedded smart surveillance sensor uh using

the ptz camera to uh for the target as he tried to move out of

the field of view of the dsp on the dm six forty seventy six point

uh and the target we use is the swad based tracker the results show high

accuracy um accuracy and precision under partial occlusion

so for future work we will try to think include also complete occlusion handling

uh C

take upon this paper is to avoid the just published so here you have a

big deal with the swad based tracker we don't occlusion handling you can see that

the tracker loses the target is it becomes occluded while we didn't you originally technique

really able to recover the target this it comes out of depression

so for future work we will try to implement also this feature on the board

so this concludes my presentation thank you for listening in a few not constantly have

a test i

right

uh_huh

at the moment and we don't use the so uh feature of the calmer so

yes when the target most close to the camera the at the target the size

of the target sure larger screen and just not we don't do it for simplicity

but as you can this is solved

basically what we updated i

see here though they're target the smaller so we can uh interface

and then to close it closer to the camera so we can uh incorporate the

changes of the target in of them but at the moment we don't uh i

just a precise the target that's another thing to do in the future

okay

right

right there this target is not for face tracking or any particular objects that is

it's a target tracking so it works is always a target they say and the

obvious a good texture

okay so you can discriminate target from the from the background so here we start

from the face as an example and then he moves exactly closer to the calmer

obviously the face is the big for the template and gets my fading mimo my

neck

so by the generous for any object is not only for france

yeah

well

right

but you mean for the for the future work on mention yes okay and in

that in this uh in this paper to enter with a complete occlusion basically what

we do is we don't update when the target was under occlusion with an update

the whole template at the same time but same weight but we have different weights

for all the pixels in the time it so when you go center occlusion we

don't update decide the one of the possible with it only this one

and eventually when you see it yep it only few pictures on the site the

means of the target is going to be able to discern so in the next

three next few frames that is occluded

in that case you don't update anymore and you say the target is occluded and

then when it comes out is you have not updated decide the occluded one when

it comes out on the occlusion the target is the template is preserved so again

you can find the best match for your started

yeah

a yeah it can be adapted for

yeah well this is an usual in surveillance you have three components the detection algorithm

the tracking algorithm then the position or something that this is only the talking a

good

for an to select the target you can either we manually we can use an

automatic algorithm

usually in surveillance systems you have a person driving the ptz

trying to find something and then the rest and we're not to be the set

on the target

and then why this algorithm to talk

she

right

okay and the template

this thing about it depends on the landing factor

yeah

in this case as we process but more than twenty five from the second we

give important way to the previous uh

to the previous template into the best match but you can choose the brain in

a real application so to who you want you want to give more weight so

if you wanna have a um rgc

you want to preserve just ten but then you would give more weight to your

previous time but

okay if you want to a docking very fast and you will give more weight

to the best match in the case for example you give a divorce a divorce

and seventy percent of the best match so you're able to incorporate the changes in

the ten but for

DSP Embedded Smart Surveillance Sensor with Robust SWAD-based Tracker

DSP and Hardware

Gaetano Di Caterina, Iain Hunter, John James Soraghan