0:00:13 | i |
---|
0:00:13 | it yeah thank you very much for the and box |
---|
0:00:16 | uh to that would like to present a a rabbit image people approach for mobile location |
---|
0:00:20 | and as all know the availability of of uh gps P S limited so all the scenarios but we have |
---|
0:00:25 | only very few obstacles |
---|
0:00:27 | and also uh we would have a very low signal reception |
---|
0:00:30 | in a urban canyons |
---|
0:00:31 | a cost them out |
---|
0:00:35 | it do you hardly possible to have any position in like an yeah or train station with in libraries and |
---|
0:00:40 | so on |
---|
0:00:40 | but these are actually the base but we have the most interesting location base uh |
---|
0:00:45 | it's rather than a on relying |
---|
0:00:47 | do not want to rely |
---|
0:00:49 | a location the localisation systems |
---|
0:00:51 | that based on the wife for i've at these and stalls |
---|
0:00:53 | a require a infrastructure |
---|
0:00:55 | case |
---|
0:00:55 | size of in bar |
---|
0:00:57 | rather like to use |
---|
0:00:59 | um images |
---|
0:01:00 | uh a recorded by a what device |
---|
0:01:02 | uh |
---|
0:01:03 | and to match them to a a visual record |
---|
0:01:05 | a like a speech you |
---|
0:01:07 | which would allow us to derive the pose in a very natural way to right from |
---|
0:01:11 | to do that |
---|
0:01:12 | you you'd by a content based image with and |
---|
0:01:16 | to match is very images |
---|
0:01:17 | um to the reference my |
---|
0:01:20 | among those uh can a base which you were the so called a feature based approach some some you on |
---|
0:01:25 | and consider a art |
---|
0:01:27 | and applying them to the task of location mission several challenges the rice |
---|
0:01:31 | see and uh |
---|
0:01:32 | a image here |
---|
0:01:33 | so we took a with a mobile device |
---|
0:01:35 | a pictures on uh |
---|
0:01:36 | it don't on it for |
---|
0:01:38 | and they are um suppose you match to reference later |
---|
0:01:41 | which will be point |
---|
0:01:42 | visually most similar reference data your |
---|
0:01:44 | which speech you um depicted in a lot |
---|
0:01:48 | you see there is a large uh baseline or |
---|
0:01:50 | uh between the two images |
---|
0:01:52 | cost but that you only have |
---|
0:01:54 | ask reference date or |
---|
0:01:56 | and |
---|
0:01:56 | um |
---|
0:01:58 | so the the parameters this |
---|
0:01:59 | approximately twelve point five |
---|
0:02:02 | but the um |
---|
0:02:03 | have |
---|
0:02:04 | um very different uh different lighting conditions |
---|
0:02:07 | dynamic object |
---|
0:02:08 | like cost of de since and also a very complex we domain |
---|
0:02:12 | oh of a most important the we require a very low which people |
---|
0:02:15 | i |
---|
0:02:16 | uh would just "'cause" the constantly changing use that tension |
---|
0:02:19 | and that's a great that he changing you of you |
---|
0:02:22 | so these will be very essential |
---|
0:02:23 | and we can achieve that by for |
---|
0:02:25 | four |
---|
0:02:25 | extracting um |
---|
0:02:27 | the um |
---|
0:02:29 | a second features on the mobile device |
---|
0:02:31 | at very low calm |
---|
0:02:32 | the |
---|
0:02:32 | this and we could use a a rotation and feature fast features |
---|
0:02:35 | which have been recently proposed |
---|
0:02:37 | and uh those require approximately twenty seven miliseconds |
---|
0:02:41 | for a regular frame one now um nexus one |
---|
0:02:45 | now that we have these features |
---|
0:02:46 | uh we want to transmit them to the server |
---|
0:02:49 | and uh uh this information of applied and them |
---|
0:02:52 | can can do is uh |
---|
0:02:54 | was that |
---|
0:02:55 | four |
---|
0:02:55 | can do so by transmitting um only the visual but in these |
---|
0:02:59 | as you know a visual but is a um |
---|
0:03:02 | a are all |
---|
0:03:02 | features that is very very simple |
---|
0:03:05 | and and um this approach is that in the for about five times better than |
---|
0:03:09 | uh compressed histogram |
---|
0:03:11 | but to do that we need require a feature point station |
---|
0:03:14 | on the mobile device into visual |
---|
0:03:17 | a very low |
---|
0:03:20 | um so this is what we are talking about a a this presentation and the outline will be that be |
---|
0:03:24 | possible |
---|
0:03:25 | uh |
---|
0:03:25 | to use |
---|
0:03:26 | scott state-of-the-art |
---|
0:03:28 | and related work and then introduce the multiple hypothesis vocabulary tree |
---|
0:03:32 | provide some all on its quantisation structure |
---|
0:03:34 | yeah that of clustering approach and its visual worked uh weighting |
---|
0:03:38 | and we we compare with respect to state-of-the-art using experimental validation |
---|
0:03:42 | and compute the presentation of a sort a short summary |
---|
0:03:46 | so your a um on all be robust and to record conversation of features and visual words as essential importance |
---|
0:03:52 | for the uh performance of uh these i with names |
---|
0:03:56 | and among the most know my rhythms |
---|
0:03:58 | i the so called correctly k-means means |
---|
0:04:00 | the rick the flea trent ties the descriptor space by applying a means i with an |
---|
0:04:05 | at allows us to do uh to get a so-called vocabulary tree |
---|
0:04:08 | with the leaf nodes are the um so called visual words |
---|
0:04:11 | and a part of the vocabulary |
---|
0:04:13 | oh this approach can be efficiently improve using these so quote greedy search |
---|
0:04:17 | which uh considers multiple branches of the vocabulary tree |
---|
0:04:21 | and to find the um close as visual words to uh a a a very descriptor |
---|
0:04:26 | and this can be considered as some kind of |
---|
0:04:28 | uh but um back tracking within the vocabulary tree |
---|
0:04:32 | the hamming embedding on the other hand |
---|
0:04:33 | uh a store a strongly "'cause" some are strongly quantized and the menu use descriptors |
---|
0:04:39 | uh to allow for for the differentiation |
---|
0:04:41 | uh i on uh uh with a within one visual were |
---|
0:04:45 | last but these the approximate means uh generates a flat vocabulary |
---|
0:04:49 | um by just one time line to can means i with them |
---|
0:04:53 | two these billions of features |
---|
0:04:54 | and to cope with the computational complexity |
---|
0:04:57 | and they up an approximate nearest neighbor search |
---|
0:05:00 | using randomized at tree |
---|
0:05:02 | not to evaluate um these different i'm with means um |
---|
0:05:06 | we apply them to uh typical location which you'll task |
---|
0:05:09 | in the area of approximately four square kilometres |
---|
0:05:12 | including about five thousand pound around most |
---|
0:05:14 | which are is "'cause" each composed of um twelve rectified image |
---|
0:05:19 | they very images um have a size of a six and forty before at is uh and are represented by |
---|
0:05:25 | a want of features each one average |
---|
0:05:27 | and it's a see in a strong here we have uh these |
---|
0:05:30 | so um small circles representing the panorama us |
---|
0:05:33 | which are just in the board uh but this is between them are is about will point six meters |
---|
0:05:38 | and it very image is a placed right between them |
---|
0:05:41 | um |
---|
0:05:41 | shifted D uh to the left by forty five degrees |
---|
0:05:44 | with an opening and you'll of about sixty degrees |
---|
0:05:47 | now we would like to compare the uh are related to a um of the the the state-of-the-art art |
---|
0:05:52 | um by precision recall measures |
---|
0:05:55 | i in to a recall of one |
---|
0:05:57 | uh we require the i ones to uh find to two close drama as other once within ten meters |
---|
0:06:03 | um to be retrieved |
---|
0:06:05 | and of uh con one correspondingly |
---|
0:06:07 | a precision of one is achieved if these two pound our must a first |
---|
0:06:13 | now if you take a look at this graph to see that we do not have a uh only have |
---|
0:06:16 | signal |
---|
0:06:17 | precision uh recall past but multiple |
---|
0:06:20 | since sequence consider up to five percent of the database |
---|
0:06:23 | where obviously the a probability that be a um contain |
---|
0:06:26 | or or uh also which we've the relevant can drama since high um |
---|
0:06:30 | then |
---|
0:06:30 | uh uh if if you consider only a few samples |
---|
0:06:33 | well i also the precision is and no wasn't be also include |
---|
0:06:36 | uh i'm relevant columns of course |
---|
0:06:38 | so now if we compare the different approaches we see that the you directly k-means is inferior to the other |
---|
0:06:43 | approaches |
---|
0:06:44 | well i but requires only sixty L two distance computations which makes it very very fast |
---|
0:06:50 | it can be P uh efficiently improve |
---|
0:06:53 | by |
---|
0:06:53 | and applying to greedy search |
---|
0:06:55 | um to the H K M at the cost of increased very time |
---|
0:06:59 | so we we require while ball five at and ten L two distance computations |
---|
0:07:03 | to achieve this graph yeah |
---|
0:07:05 | the having embedding requires only one third of the computational complexity |
---|
0:07:09 | um but it increased memory requirements as |
---|
0:07:12 | uh these strongly quantized the script have to be stored on the mobile device which is to |
---|
0:07:17 | the H K M us but least is um |
---|
0:07:19 | set to perform one or ninety two L two distance computations with an eight randomized K tree |
---|
0:07:25 | a see um |
---|
0:07:27 | it's but it yeah right and spy um by the inferior triple uh chris see the H T M |
---|
0:07:32 | is still most suitable for or specific location recognition task mobile location recognition task |
---|
0:07:38 | um and is was also used in the paper proposing the coding um of features as visual uh bird in |
---|
0:07:44 | this C |
---|
0:07:46 | it requires twenty five miliseconds on a two point four you has this of you |
---|
0:07:50 | so i'm for need we do not have that stops if use on mobile devices |
---|
0:07:54 | and they are and the range of about one because that's |
---|
0:07:57 | so we files as to have even faster approaches and to this and we use |
---|
0:08:01 | the multiple hypothesis vocabulary tree |
---|
0:08:04 | and i we will uh go little bit into its see rolls on the on its when the station structure |
---|
0:08:09 | the first of voice you all know with an increase of the branching factor |
---|
0:08:13 | um we will improve their which retrieval performance |
---|
0:08:16 | however this ultimately needs |
---|
0:08:18 | to than your search and thus an enormous computational complexity if you just search through all possible show |
---|
0:08:25 | and it's we want to minimize the very time to achieve mobile location recognition |
---|
0:08:29 | we limit the M each T to binary decisions |
---|
0:08:32 | but it means that we split the uh we separate the descriptors |
---|
0:08:35 | along with the X or the direction of maximum variance |
---|
0:08:38 | which is indicated by the back to you here |
---|
0:08:42 | actually be split along be um separating hyperplane |
---|
0:08:45 | which is uh a point uh and um place at the mean of these descriptors that are within a particular |
---|
0:08:50 | no |
---|
0:08:52 | oh there's obviously um |
---|
0:08:53 | the script that are very close to this type of plane |
---|
0:08:56 | so the probability that a matching just a very descriptor |
---|
0:08:59 | it's just on the other side of the separate uh a hyperplane is high |
---|
0:09:03 | a to a white this some biggest decisions we apply a so called overlapping buffer |
---|
0:09:08 | it's in separating have a fence |
---|
0:09:10 | there with is actually defined by the variance |
---|
0:09:13 | all the um data here |
---|
0:09:15 | now uh this in was so close pitch trees and if |
---|
0:09:19 | um are feature is assigned to the data base features assigned to just overlapping buffer |
---|
0:09:25 | then it will not need um separated will be assigned |
---|
0:09:28 | to both child nodes |
---|
0:09:31 | and this allows us to avoid |
---|
0:09:33 | the i'm big use decisions if uh |
---|
0:09:35 | features are are very close to the separating how |
---|
0:09:38 | okay |
---|
0:09:40 | so altogether to so a so um now um use that they the database descriptors |
---|
0:09:45 | follow multiple hypothetical past through the tree |
---|
0:09:48 | a a that a very descriptor could rubber |
---|
0:09:51 | and this makes us particularly robust against white uh variations |
---|
0:09:55 | it could be um could stem for instance |
---|
0:09:57 | um from white baselines lines |
---|
0:10:02 | so now that we have um now that we can recursively of I's the descriptor space |
---|
0:10:08 | we could and continue until a certain mix number |
---|
0:10:11 | of um features is reached for no |
---|
0:10:15 | um um i as we consider large data |
---|
0:10:17 | um those can result |
---|
0:10:19 | or a certain uh you've resulting in different sized descriptor clusters |
---|
0:10:23 | which could stem for instance from |
---|
0:10:25 | uh different of currency free pins all certain textures |
---|
0:10:28 | which are are as window |
---|
0:10:31 | not to of whites the over fitting of such descriptor clusters |
---|
0:10:35 | uh we want to stop the separation ones uh the descriptors is close to hypersphere sphere |
---|
0:10:40 | which means |
---|
0:10:41 | that the descriptor class is consistent in itself and no further uh separation is necessary or useful |
---|
0:10:49 | and efficient approximation to do that would be to take a look at the racial features |
---|
0:10:53 | that are signs to this over of a |
---|
0:10:56 | he C actually an note that can be very well separated |
---|
0:10:59 | a strong variance uh can be observed in this direction |
---|
0:11:02 | and on the other hand you have here a different note |
---|
0:11:04 | well a the bearings is almost the same all directions and a large a fraction of the features |
---|
0:11:09 | is assigned to it um overlapping phone |
---|
0:11:13 | and this would mean that's also more conversation steps would be required |
---|
0:11:16 | to just uh to separate this |
---|
0:11:18 | now we stop the separate uh this |
---|
0:11:20 | separation process once a certain fraction is |
---|
0:11:22 | uh included into this overlapping buffer |
---|
0:11:25 | so this not only avoids um the |
---|
0:11:27 | um over fitting effects and thus improve the retrieval performance but also we use the size of the tree |
---|
0:11:33 | and thus also |
---|
0:11:34 | the quantisation time |
---|
0:11:35 | sorry if time but that me |
---|
0:11:40 | now that we have a three um organisation structure that allows to uh cope with these uh continues the space |
---|
0:11:47 | we also want to integrate the probability |
---|
0:11:49 | um of feature front oh um but a a very and database descriptors are a assigned to the same visual |
---|
0:11:55 | were |
---|
0:11:58 | and that's we know that matching um descriptors scriptures a follow a dimensional passion distribution |
---|
0:12:03 | we can say that at the probability that a feature |
---|
0:12:06 | is assigned to the other side of this overlapping buffer or a separating hyperplane |
---|
0:12:11 | corresponds to the interval go over this area of a slap action a solution |
---|
0:12:15 | so here we have a very feature a |
---|
0:12:18 | and to probability that uh a the a matching uh database descriptors you lost |
---|
0:12:23 | would be assigned to a different |
---|
0:12:24 | um node |
---|
0:12:26 | is corresponding to this part of the slope passion the solution |
---|
0:12:30 | so now we have multiple um |
---|
0:12:32 | uh conversation stamps and of course everyone has to be correct |
---|
0:12:36 | that means that we have to to find a probability |
---|
0:12:39 | that's uh make that that um a great a script and database descriptors signed to the same |
---|
0:12:44 | we shall word |
---|
0:12:45 | um this has to be always to say uh correct and thus |
---|
0:12:48 | uh the probability um is the multiplication of these individual probability |
---|
0:12:53 | so with these probabilities we can actually weight |
---|
0:12:56 | the distance calculation |
---|
0:12:58 | between the very few F vector and you reference you F vector |
---|
0:13:01 | so that means that a feature that has in more reliably quantized |
---|
0:13:05 | a more or has a lot hard a larger contribution to this is quite relations |
---|
0:13:10 | a collection and a feature that is less reliably kind |
---|
0:13:14 | so features uh where we are very confident about the visual words um assignment |
---|
0:13:19 | contribute more to the you have distance compilation |
---|
0:13:25 | now we want to compare our approach uh with respect to the |
---|
0:13:28 | um H K am i with them which was used so far |
---|
0:13:32 | and uh we can sum up and say that uh with these with a like wouldn't be did hardly increase |
---|
0:13:37 | the very time so of the um |
---|
0:13:40 | uh uh that class reading and all the um uh the which will what waiting adds up to the uh |
---|
0:13:46 | overall carried |
---|
0:13:48 | and it's C see here we have applied the same experiments |
---|
0:13:51 | as we described before |
---|
0:13:53 | where we have to find the uh too close as |
---|
0:13:55 | spun around must around every very location |
---|
0:13:58 | and this is the the same group as we had before |
---|
0:14:00 | for the H K M |
---|
0:14:01 | at the range of ten meters |
---|
0:14:03 | so the image T allows for significant improve from this respect to the a few the performance |
---|
0:14:09 | and this is even more significant |
---|
0:14:11 | if we uh uh a ask the are reasons to find the for close to upon must |
---|
0:14:16 | which other ones uh within twenty meters |
---|
0:14:20 | what most importantly we managed to uh achieve um um um um |
---|
0:14:24 | overall uh a great time of two point five milliseconds |
---|
0:14:28 | but one thousand very descriptors |
---|
0:14:29 | on a two point C your hz that's up C you |
---|
0:14:32 | and this is a ten point um a ten fold speed-up |
---|
0:14:35 | with respect to the edge K |
---|
0:14:39 | so to conclude the presentation you can say that uh we are facing the problem |
---|
0:14:42 | of uh feature quantisation on the mobile device |
---|
0:14:46 | to facilitate a mobile location recognition and we did that |
---|
0:14:49 | by generating a a multiple hypothesis vocabulary |
---|
0:14:52 | oh which allows us to cope with "'em" biggest conversations step |
---|
0:14:56 | and we uh use an adaptive clustering to we use the or fitting effects |
---|
0:15:00 | in to integrate the probability of correct feature station into the distance calculation |
---|
0:15:05 | as allows us all together to achieve a ten fold speed-up spec to state-of-the-art |
---|
0:15:09 | and this results in twelve miliseconds on for one thousand scriptures on an S one |
---|
0:15:15 | and a combination of the M each treaty with rotation brand fast features and tree to count coding |
---|
0:15:19 | and a small i i looked at um mobile will be a time location recognition |
---|
0:15:23 | at thirty frames per second |
---|
0:15:25 | so like to thank you very much for attention and if you have any questions for retrieved |
---|
0:15:45 | was that is pretty much the same as uh so there's that was not change right we still send uh |
---|
0:15:50 | the the same amount uh are you have the same number of a |
---|
0:15:53 | for the in C |
---|
0:15:57 | but |
---|
0:15:57 | depends very much on how many features use send so uh i think it's in the range of um it's |
---|
0:16:02 | it's |
---|
0:16:03 | a five a i don't wanna say something wrong yeah but it's five times less a straw would require to |
---|
0:16:07 | send the same thus |
---|
0:16:13 | so it is um we did not invent here and you compression scheme |
---|
0:16:17 | uh that is like still this um this coding of which word in this indices |
---|
0:16:21 | um but um i think it's compatible able but we have no prior knowledge on the location of the mobile |
---|
0:16:26 | device |
---|
0:16:27 | now we still have like one three that is already in a mobile device you can use this also for |
---|
0:16:32 | a a a mobile formal product mission than anything |
---|
0:16:35 | um there's no prior knowledge uh include |
---|