Speech Transcript - Entropy based Supervised Merging for Visual Categorization

here so i'm going to present uh software going with uh what i might be

a system and uh it's fine yes or entropy based supervised merging for one critical

in section

and here is the outline of my talk so i will uh briefly remind you

about the back of what model which is the basis for a while and then

i will introduce a technique that we propose which is the set of is close

to being and i will present some experimental results in the room

so the problem of dealing with this is a problem of a visual concept detection

basically we want to build the system so that uh when we defeated image the

system is able to detect whether the concept appeal do not this image based on

the set of a given concept

so we these is a classical way uh from this image will be the feature

vector which is a representation of the visual content of the image content

classifier trained

since the a which is used as much as we do

do not all uh

and you

but what uh

just use the vocal going to be which are all one side image which can

be maximization

all sometimes we could use is necessary to the Z unit

yeah on each of these point to look at this

like he would you know this will see one

so with this process uh presentation of an image is set over

and uh

that is really is a way isn't set

i would like to

so the is what we uh we do not images from the training set and

each of these image we do

the this paper we have this we will we find it is how is a

uh a result that in and in the descriptor space we will just be used

in the disk space and language this paper we will be shown in london or

one like say in each of these that we represent each one where

so in order to be matched with

all right well we compute the local the speaker and then we for each descriptor

we in which that it is an easy to uh the signal space

is that will be we don't by one of all this

though we only need this topic i just and we uh compute the is that

are all the distinct in each is you know this is that the feature presentation

oh i see that once is

so that you agree that you know that we use the solution that we the

number of discrete the B and

we met in fact that the presentation of the interior presentation will be able but

it's easy for this in the disk space

so we use the rotation automation this routine was about

density and because it's

possible

and

uh this approximation is a make one by assuming that probability density is constant actually

where each yeah it's

yeah the descriptor space

a uh so that is so that of course this is the visual channel B

is going to be very

because uh you we see used uh

the approximation

then we can also i we have a mission and that's we did not performance

because the description of image that is more inside the dictionary size with the hotel

all right side and increase the complexity we basically you could very well be something

the rest of the data but also in the detection we use this is that

uh is what we call

and also that the loss is a we will be

a way to classify them or no interest

the more i got so i

this your model that

and well see that uh usable model so that uh the way it should be

we should be sure is principle is we don't really information is that

uh this is very diversity once i think uh it makes my presentation level but

since the feature vector and classifier detect set we may see that there is some

information tracking and uh that the

the visual dictionary construction will be so this is exactly uh what we will

to this uh this presentation and vision is to use the label information in the

presentation that the future work

it is therefore we if we got a process that the that this difference

and we compare the labels we use this is so uh this is not a

display of this the images and images label by a concept so that its you

and that may be more interesting

able to stop

yeah

uh in all the cases for instance because actually doesn't matter for all the detection

set with this one you know

yeah

so uh how all this is this

for human action at uh Z label information

and the way is that we do is that the study a at which you

will be the dictionary size the so you know that we are going to be

a much a dictionary actually several times yeah it is and

so is a large number one and then we do this and these were well

into a final uh i'd like to uh consider the right yeah i use

so we look at it would be where and when they are compared the same

information about

so basically uh this is this step uh

well going to uh and stuff like this to the you do you dynasty is

more well then we know the where and it ultimately we have on the one

with each is your well maybe and all considered

so uh i yeah is select the best one and one we decrease the size

of the dictionary and we continue we take this process and we combine speech is

the to the size

now what is that like a comedy and a mouse is so the way that

no information about label distinctive

so you all these videos and music

the uh each easier this company does that this is that we have one and

we can uh final set which is you what we would like to have it

involves analysis

and actually not a set of the uh

so in a more formal way what is that uh for each concept people in

each other's and each visual what we do not all clusters of the concept label

X E uh this one C by just counting for each unit which is how

this uh the proportion of this which ones

then when we can stop we can easily computed the condition the concept and then

computed for all the concept label distribution

you a visual channel which is that

now uh we don't usually P and then we don't for what happens if we

use the real data you visual dictionary what we can also you and we want

to minimize what we do with the C is which minimizes

so uh we did this process until we reach the desired size for you

oh is that we can see once the basis B C and it is based

scheme we divide people or the concept at the same time

we could also see that this uh the visual attention for each concept and if

you don't which is dependent on set

in this way we will only consider a single concept be able to be uh

and uh but which maximize information about the signal concept so we've been maybe others

due to the T is a set

all we can also add an additional uh

the concept dependent entropy is a once in which we can also but it would

say this so well to and

and usually a directivity which may not connex in the descriptors

now uh we haven't this approach using the technique they are we that the sense

that one which is about what you saw and we have a spatial context

we do is look at images and uh C uh

this is this uh which is the image and you our times the study we

use uh the and we have a support vector machine with

so as to see what we will do is used as in each of dictionary

size which is not to be we try to two times and it is

in chunking he's we

yeah almost no connection

it is yeah and the way we evaluate performance of the system is this talk

about the evaluation

which is the mean of the system and on the basis of what we basically

now we apply the classifier that this and we get to school each concept in

each shot we wish it something related to show it is what we did with

this so that we can yeah

also based on it should be a five hundred well and the initial dictionary size

one thousand two thousand and four that the which are time times and it is

the difference is uh those results that we got our baseline resulting in all five

gives the precision of a seven percent seven set and if we do this by

not and so we still have a final dictionary size of what but that multi

condition dictionary you're one of the things that of that and you see that uh

we get the performance which is a seven one nine eight point one percent

what the substance of what well as usual

so this is probably not be for the whole two distribution and we also concept

dependent

and actually and uh so i think be used as the results that are uh

as and the concept dependent

not what we mean that we have been T one map each concept yeah

the reason yeah uh

the reason is that is probably that they are

concept to a reasonable is what about the

and uh is lacking in a way that may not be fine you may be

used for all the training they are but doesn't be

no information on that

for remote uh

we also tried if this is a expander with size to recycle one thousand and

so uh if we start with an initial dictionary of size and the baseline would

be seven time

and uh

if we do this by well uh times eight times the usual size no side

but then we see that we stiff at least in once and see that they

want to the set and so um this process all the side uh also might

not see what was it should be which are with the same size

so that we can say that can set can see that the performance of concept

maybe the concept which what position is reasonable or

a number of concept which

which are very difficult which are

precision

the whole so that they use yeah uh the performance

but uh yeah but there is a plan that the simple point T is the

need

and a real so i what happens with that is one possibility is to the

constraint so that is one which i don't think it's to uh

dictionaries which on the next and this did not be such a good a yeah

actually uh i one should not be in each one thousand we increase performance but

if we see that the size of the initial dictionary then we see that the

performance is

so these the that again we have a problem at that uh the data constraint

we possibly is

we were and we decided not which a given

so i

oh we apply and they are images which is you can say so they are

yeah to what we do not can say yes they are at the moment so

we see so you can say the entropy minimization and will be selected is we

assign tdictionary side be if the initial dictionary size yeah it is not uh well

we get a simple remote or at what will be goals of this is what

surfaces on the unlabeled data and just doesn't uh doesn't you who uh the detection

and i know the way you get this is a fine you once white and

white at the same generic size and we look at another way to be you

should not be size will make the performance and this is because the dictionary size

side of each other and then back to classify data

and so the experiment here is to uh

but uh the uh the size which would be used

and what we can see here is that um

we have been outside between them in two thousand and uh is not size

and so uh two times whatever B and C the homeless and we will start

position the last see that we consider two D space and

should be a very small be easy

yeah attention this is what we are using the size of the dictionary reduces the

complexity of the classifier

which is an important step to the

so and uh it is a victory for the study which is related information

selection of the

oh this it should be of the same size is almost always one outside the

same performance uh this is the key which is the people and so we have

here the two that uh full set but five it's an efficient to balance the

complexity into

in the detection

thank you

um well complexity is so we can be uh actually at least one

well just space which i got an X is a painting

oh yes and the distance is actually the prediction in what you're getting

oh the keypoints oh we decided using uh let's motion detection

it's the basic uh is that that's detection

so it was found out

yes

yes you away i see the result would be very similar

because be used with the change uh the number of is well see

we use the space but we would need as many where we present the same

we don't see that the result

there are C

is a simple formula

look at the process

we also

they are continuous it's

it's not a distance

okay but it needs quite simple

it's a very simple

yes

that all possible uh that would be the question you see section

think about that but uh

also the point is that you uh

the issue is balance the distance that the state space you will be

labels

uh so you see that

the distance

the disk space is um

this is why initial okay

uh i know two

Entropy based Supervised Merging for Visual Categorization

Detection and Recognition

Usman F Niaz, Bernard Merialdo