0:00:16 | here so i'm going to present uh software going with uh what i might be |
---|
0:00:22 | a system and uh it's fine yes or entropy based supervised merging for one critical |
---|
0:00:28 | in section |
---|
0:00:29 | and here is the outline of my talk so i will uh briefly remind you |
---|
0:00:34 | about the back of what model which is the basis for a while and then |
---|
0:00:38 | i will introduce a technique that we propose which is the set of is close |
---|
0:00:43 | to being and i will present some experimental results in the room |
---|
0:00:49 | so the problem of dealing with this is a problem of a visual concept detection |
---|
0:00:54 | basically we want to build the system so that uh when we defeated image the |
---|
0:01:00 | system is able to detect whether the concept appeal do not this image based on |
---|
0:01:06 | the set of a given concept |
---|
0:01:10 | so we these is a classical way uh from this image will be the feature |
---|
0:01:17 | vector which is a representation of the visual content of the image content |
---|
0:01:33 | classifier trained |
---|
0:01:36 | since the a which is used as much as we do |
---|
0:01:47 | do not all uh |
---|
0:01:51 | and you |
---|
0:01:55 | but what uh |
---|
0:01:58 | just use the vocal going to be which are all one side image which can |
---|
0:02:06 | be maximization |
---|
0:02:11 | all sometimes we could use is necessary to the Z unit |
---|
0:02:22 | yeah on each of these point to look at this |
---|
0:02:27 | like he would you know this will see one |
---|
0:02:35 | oh |
---|
0:02:40 | so with this process uh presentation of an image is set over |
---|
0:02:51 | and uh |
---|
0:02:52 | that is really is a way isn't set |
---|
0:02:58 | i would like to |
---|
0:03:00 | so the is what we uh we do not images from the training set and |
---|
0:03:06 | each of these image we do |
---|
0:03:09 | the this paper we have this we will we find it is how is a |
---|
0:03:19 | uh a result that in and in the descriptor space we will just be used |
---|
0:03:29 | in the disk space and language this paper we will be shown in london or |
---|
0:03:36 | one like say in each of these that we represent each one where |
---|
0:03:44 | so in order to be matched with |
---|
0:03:47 | all right well we compute the local the speaker and then we for each descriptor |
---|
0:03:57 | we in which that it is an easy to uh the signal space |
---|
0:04:03 | is that will be we don't by one of all this |
---|
0:04:08 | though we only need this topic i just and we uh compute the is that |
---|
0:04:15 | are all the distinct in each is you know this is that the feature presentation |
---|
0:04:25 | oh i see that once is |
---|
0:04:28 | so that you agree that you know that we use the solution that we the |
---|
0:04:34 | number of discrete the B and |
---|
0:04:38 | we met in fact that the presentation of the interior presentation will be able but |
---|
0:04:47 | it's easy for this in the disk space |
---|
0:04:51 | so we use the rotation automation this routine was about |
---|
0:04:56 | density and because it's |
---|
0:04:58 | possible |
---|
0:05:00 | i |
---|
0:05:02 | and |
---|
0:05:04 | and |
---|
0:05:08 | uh this approximation is a make one by assuming that probability density is constant actually |
---|
0:05:18 | where each yeah it's |
---|
0:05:22 | yeah the descriptor space |
---|
0:05:26 | a uh so that is so that of course this is the visual channel B |
---|
0:05:33 | is going to be very |
---|
0:05:35 | because uh you we see used uh |
---|
0:05:40 | the approximation |
---|
0:05:44 | then we can also i we have a mission and that's we did not performance |
---|
0:05:51 | because the description of image that is more inside the dictionary size with the hotel |
---|
0:05:58 | all right side and increase the complexity we basically you could very well be something |
---|
0:06:08 | the rest of the data but also in the detection we use this is that |
---|
0:06:13 | uh is what we call |
---|
0:06:19 | and also that the loss is a we will be |
---|
0:06:26 | a way to classify them or no interest |
---|
0:06:31 | the more i got so i |
---|
0:06:34 | this your model that |
---|
0:06:38 | and well see that uh usable model so that uh the way it should be |
---|
0:06:46 | we should be sure is principle is we don't really information is that |
---|
0:06:53 | uh this is very diversity once i think uh it makes my presentation level but |
---|
0:07:02 | since the feature vector and classifier detect set we may see that there is some |
---|
0:07:08 | information tracking and uh that the |
---|
0:07:13 | the visual dictionary construction will be so this is exactly uh what we will |
---|
0:07:21 | to this uh this presentation and vision is to use the label information in the |
---|
0:07:29 | presentation that the future work |
---|
0:07:33 | it is therefore we if we got a process that the that this difference |
---|
0:07:39 | and we compare the labels we use this is so uh this is not a |
---|
0:07:44 | display of this the images and images label by a concept so that its you |
---|
0:07:54 | and that may be more interesting |
---|
0:07:59 | able to stop |
---|
0:08:00 | yeah |
---|
0:08:02 | uh in all the cases for instance because actually doesn't matter for all the detection |
---|
0:08:11 | set with this one you know |
---|
0:08:15 | yeah |
---|
0:08:17 | so uh how all this is this |
---|
0:08:25 | for human action at uh Z label information |
---|
0:08:29 | and the way is that we do is that the study a at which you |
---|
0:08:36 | will be the dictionary size the so you know that we are going to be |
---|
0:08:41 | a much a dictionary actually several times yeah it is and |
---|
0:08:47 | so is a large number one and then we do this and these were well |
---|
0:08:55 | into a final uh i'd like to uh consider the right yeah i use |
---|
0:09:03 | so we look at it would be where and when they are compared the same |
---|
0:09:11 | information about |
---|
0:09:13 | so basically uh this is this step uh |
---|
0:09:18 | well going to uh and stuff like this to the you do you dynasty is |
---|
0:09:23 | more well then we know the where and it ultimately we have on the one |
---|
0:09:30 | with each is your well maybe and all considered |
---|
0:09:40 | so uh i yeah is select the best one and one we decrease the size |
---|
0:09:50 | of the dictionary and we continue we take this process and we combine speech is |
---|
0:09:55 | the to the size |
---|
0:09:59 | now what is that like a comedy and a mouse is so the way that |
---|
0:10:08 | no information about label distinctive |
---|
0:10:12 | so you all these videos and music |
---|
0:10:16 | the uh each easier this company does that this is that we have one and |
---|
0:10:26 | we can uh final set which is you what we would like to have it |
---|
0:10:34 | involves analysis |
---|
0:10:36 | and actually not a set of the uh |
---|
0:10:45 | so in a more formal way what is that uh for each concept people in |
---|
0:10:50 | each other's and each visual what we do not all clusters of the concept label |
---|
0:10:56 | X E uh this one C by just counting for each unit which is how |
---|
0:11:03 | this uh the proportion of this which ones |
---|
0:11:10 | then when we can stop we can easily computed the condition the concept and then |
---|
0:11:18 | computed for all the concept label distribution |
---|
0:11:22 | you a visual channel which is that |
---|
0:11:26 | now uh we don't usually P and then we don't for what happens if we |
---|
0:11:34 | use the real data you visual dictionary what we can also you and we want |
---|
0:11:40 | to minimize what we do with the C is which minimizes |
---|
0:11:50 | so uh we did this process until we reach the desired size for you |
---|
0:11:58 | oh is that we can see once the basis B C and it is based |
---|
0:12:06 | scheme we divide people or the concept at the same time |
---|
0:12:11 | we could also see that this uh the visual attention for each concept and if |
---|
0:12:18 | you don't which is dependent on set |
---|
0:12:21 | in this way we will only consider a single concept be able to be uh |
---|
0:12:27 | and uh but which maximize information about the signal concept so we've been maybe others |
---|
0:12:36 | due to the T is a set |
---|
0:12:40 | all we can also add an additional uh |
---|
0:12:48 | the concept dependent entropy is a once in which we can also but it would |
---|
0:12:55 | say this so well to and |
---|
0:13:00 | and usually a directivity which may not connex in the descriptors |
---|
0:13:08 | now uh we haven't this approach using the technique they are we that the sense |
---|
0:13:15 | that one which is about what you saw and we have a spatial context |
---|
0:13:23 | um |
---|
0:13:24 | we do is look at images and uh C uh |
---|
0:13:31 | this is this uh which is the image and you our times the study we |
---|
0:13:39 | use uh the and we have a support vector machine with |
---|
0:13:48 | so as to see what we will do is used as in each of dictionary |
---|
0:13:54 | size which is not to be we try to two times and it is |
---|
0:14:02 | in chunking he's we |
---|
0:14:06 | yeah almost no connection |
---|
0:14:09 | it is yeah and the way we evaluate performance of the system is this talk |
---|
0:14:14 | about the evaluation |
---|
0:14:16 | which is the mean of the system and on the basis of what we basically |
---|
0:14:22 | now we apply the classifier that this and we get to school each concept in |
---|
0:14:29 | each shot we wish it something related to show it is what we did with |
---|
0:14:37 | this so that we can yeah |
---|
0:14:45 | also based on it should be a five hundred well and the initial dictionary size |
---|
0:14:51 | one thousand two thousand and four that the which are time times and it is |
---|
0:14:56 | the difference is uh those results that we got our baseline resulting in all five |
---|
0:15:04 | gives the precision of a seven percent seven set and if we do this by |
---|
0:15:11 | not and so we still have a final dictionary size of what but that multi |
---|
0:15:18 | condition dictionary you're one of the things that of that and you see that uh |
---|
0:15:24 | we get the performance which is a seven one nine eight point one percent |
---|
0:15:33 | what the substance of what well as usual |
---|
0:15:39 | so this is probably not be for the whole two distribution and we also concept |
---|
0:15:46 | dependent |
---|
0:15:49 | and actually and uh so i think be used as the results that are uh |
---|
0:15:54 | as and the concept dependent |
---|
0:15:59 | not what we mean that we have been T one map each concept yeah |
---|
0:16:08 | the reason yeah uh |
---|
0:16:13 | the reason is that is probably that they are |
---|
0:16:20 | concept to a reasonable is what about the |
---|
0:16:26 | and uh is lacking in a way that may not be fine you may be |
---|
0:16:34 | used for all the training they are but doesn't be |
---|
0:16:38 | no information on that |
---|
0:16:42 | for remote uh |
---|
0:16:46 | um |
---|
0:16:47 | we also tried if this is a expander with size to recycle one thousand and |
---|
0:16:54 | so uh if we start with an initial dictionary of size and the baseline would |
---|
0:17:00 | be seven time |
---|
0:17:02 | and uh |
---|
0:17:04 | if we do this by well uh times eight times the usual size no side |
---|
0:17:14 | but then we see that we stiff at least in once and see that they |
---|
0:17:19 | want to the set and so um this process all the side uh also might |
---|
0:17:28 | not see what was it should be which are with the same size |
---|
0:17:38 | so that we can say that can set can see that the performance of concept |
---|
0:17:43 | maybe the concept which what position is reasonable or |
---|
0:17:49 | a number of concept which |
---|
0:17:52 | which are very difficult which are |
---|
0:17:54 | precision |
---|
0:17:55 | the whole so that they use yeah uh the performance |
---|
0:18:02 | but uh yeah but there is a plan that the simple point T is the |
---|
0:18:08 | need |
---|
0:18:11 | and a real so i what happens with that is one possibility is to the |
---|
0:18:19 | constraint so that is one which i don't think it's to uh |
---|
0:18:27 | dictionaries which on the next and this did not be such a good a yeah |
---|
0:18:32 | actually uh i one should not be in each one thousand we increase performance but |
---|
0:18:42 | if we see that the size of the initial dictionary then we see that the |
---|
0:18:47 | performance is |
---|
0:18:48 | so these the that again we have a problem at that uh the data constraint |
---|
0:18:55 | we possibly is |
---|
0:18:57 | we were and we decided not which a given |
---|
0:19:04 | so i |
---|
0:19:08 | oh we apply and they are images which is you can say so they are |
---|
0:19:18 | uh |
---|
0:19:19 | yeah to what we do not can say yes they are at the moment so |
---|
0:19:26 | we see so you can say the entropy minimization and will be selected is we |
---|
0:19:35 | assign tdictionary side be if the initial dictionary size yeah it is not uh well |
---|
0:19:44 | we get a simple remote or at what will be goals of this is what |
---|
0:19:49 | surfaces on the unlabeled data and just doesn't uh doesn't you who uh the detection |
---|
0:20:00 | and i know the way you get this is a fine you once white and |
---|
0:20:09 | white at the same generic size and we look at another way to be you |
---|
0:20:16 | should not be size will make the performance and this is because the dictionary size |
---|
0:20:21 | side of each other and then back to classify data |
---|
0:20:27 | and so the experiment here is to uh |
---|
0:20:31 | but uh the uh the size which would be used |
---|
0:20:38 | and what we can see here is that um |
---|
0:20:43 | we have been outside between them in two thousand and uh is not size |
---|
0:20:52 | and so uh two times whatever B and C the homeless and we will start |
---|
0:21:03 | position the last see that we consider two D space and |
---|
0:21:09 | should be a very small be easy |
---|
0:21:13 | yeah attention this is what we are using the size of the dictionary reduces the |
---|
0:21:18 | complexity of the classifier |
---|
0:21:20 | which is an important step to the |
---|
0:21:24 | so and uh it is a victory for the study which is related information |
---|
0:21:32 | selection of the |
---|
0:21:34 | oh this it should be of the same size is almost always one outside the |
---|
0:21:40 | same performance uh this is the key which is the people and so we have |
---|
0:21:49 | here the two that uh full set but five it's an efficient to balance the |
---|
0:21:57 | complexity into |
---|
0:21:59 | in the detection |
---|
0:22:03 | thank you |
---|
0:22:04 | i |
---|
0:22:06 | i |
---|
0:22:19 | um well complexity is so we can be uh actually at least one |
---|
0:22:29 | well just space which i got an X is a painting |
---|
0:22:41 | oh yes and the distance is actually the prediction in what you're getting |
---|
0:22:58 | oh the keypoints oh we decided using uh let's motion detection |
---|
0:23:06 | it's the basic uh is that that's detection |
---|
0:23:11 | so it was found out |
---|
0:23:18 | oh |
---|
0:23:25 | yes |
---|
0:23:28 | i |
---|
0:23:35 | yes you away i see the result would be very similar |
---|
0:23:43 | because be used with the change uh the number of is well see |
---|
0:23:50 | we use the space but we would need as many where we present the same |
---|
0:23:59 | we don't see that the result |
---|
0:24:23 | there are C |
---|
0:24:24 | is a simple formula |
---|
0:24:28 | look at the process |
---|
0:24:33 | we also |
---|
0:24:39 | they are continuous it's |
---|
0:24:41 | it's not a distance |
---|
0:24:48 | okay but it needs quite simple |
---|
0:24:51 | it's a very simple |
---|
0:25:02 | yes |
---|
0:25:09 | uh |
---|
0:25:15 | that all possible uh that would be the question you see section |
---|
0:25:20 | think about that but uh |
---|
0:25:33 | also the point is that you uh |
---|
0:25:38 | the issue is balance the distance that the state space you will be |
---|
0:25:44 | labels |
---|
0:25:45 | uh so you see that |
---|
0:25:50 | the distance |
---|
0:25:51 | the disk space is um |
---|
0:25:55 | this is why initial okay |
---|
0:26:00 | uh i know two |
---|