0:00:13 | you |
---|
0:00:13 | introduction election |
---|
0:00:14 | so is |
---|
0:00:15 | don't work was the formant that now i'm we use as you have then |
---|
0:00:19 | but that is |
---|
0:00:20 | from yeah a on the |
---|
0:00:22 | on the risk of ones i'm said i |
---|
0:00:26 | so this took okay a an interesting in the large |
---|
0:00:29 | the base |
---|
0:00:30 | of media |
---|
0:00:31 | uh the nuns |
---|
0:00:32 | uh new image sell to large that the base |
---|
0:00:36 | day |
---|
0:00:36 | give a means that when you |
---|
0:00:39 | on in state of the your the |
---|
0:00:41 | scheme |
---|
0:00:42 | an image |
---|
0:00:43 | a your you present the by |
---|
0:00:45 | about one thousand or two thousand |
---|
0:00:47 | so it means that we have to on the |
---|
0:00:50 | to be billion |
---|
0:00:51 | uh descriptors |
---|
0:00:52 | on the the the basis get those |
---|
0:00:54 | oh sift descriptors we |
---|
0:00:55 | or |
---|
0:00:56 | uh of them mentioned one like |
---|
0:00:59 | uh if we look at a do so much |
---|
0:01:02 | uh we would like to |
---|
0:01:03 | right |
---|
0:01:04 | and |
---|
0:01:05 | a thousand i well so |
---|
0:01:06 | video |
---|
0:01:07 | for instance and trying and evaluation task |
---|
0:01:10 | as a |
---|
0:01:11 | that a two hundred hour |
---|
0:01:13 | on this to do or also present it by billions of what you want your |
---|
0:01:19 | and and some in music retrieval or uh was really is the uh |
---|
0:01:23 | the column down you some that |
---|
0:01:24 | the base |
---|
0:01:25 | uh on the gains all the it is one billion |
---|
0:01:31 | oh you can take a more concrete example of uh are |
---|
0:01:35 | spatial to that's like V |
---|
0:01:37 | uh evaluation in the copy detection test |
---|
0:01:40 | uh uh we have extra from do about sweet that's five billion image is |
---|
0:01:45 | thus |
---|
0:01:45 | two represents a a a a a of the database |
---|
0:01:49 | on on a D |
---|
0:01:50 | uh about uh |
---|
0:01:51 | one don't quite fourteen million uh we do it |
---|
0:01:55 | and look do is |
---|
0:01:57 | we want to sell some query |
---|
0:01:59 | uh |
---|
0:02:01 | zero on look whereas as or of the base |
---|
0:02:03 | based keep those |
---|
0:02:05 | which mean |
---|
0:02:06 | is that |
---|
0:02:06 | for each disk |
---|
0:02:07 | though |
---|
0:02:08 | we look at |
---|
0:02:09 | nearest neighbor or |
---|
0:02:10 | we Q |
---|
0:02:11 | D at that so it could and distance |
---|
0:02:13 | a for the so |
---|
0:02:15 | now if we look at uh exhaustive even now search |
---|
0:02:18 | discarding and talk |
---|
0:02:20 | uh |
---|
0:02:21 | we have a one for |
---|
0:02:22 | frame described i |
---|
0:02:24 | but one thousand descriptors |
---|
0:02:26 | we we have to make trillions of |
---|
0:02:28 | i mention that i |
---|
0:02:29 | vector vector reason |
---|
0:02:31 | that is in the order of ten our four |
---|
0:02:33 | the remote through |
---|
0:02:35 | so we can do it |
---|
0:02:36 | for single |
---|
0:02:37 | frame |
---|
0:02:38 | why we need for our powerful |
---|
0:02:40 | approximate to make the search research |
---|
0:02:42 | which are situation |
---|
0:02:44 | she's quite to used to avoid program |
---|
0:02:46 | but also so memory efficient |
---|
0:02:48 | okay i i wouldn't miss is or |
---|
0:02:52 | so uh uh such a reason as to to nice |
---|
0:02:56 | i to to re uh uh from speech yeah |
---|
0:02:58 | a a a pretty C |
---|
0:03:00 | it's great quite to use that to retrieve though should be actual |
---|
0:03:04 | you as neighbours |
---|
0:03:06 | just be the |
---|
0:03:07 | is why we make approximate search |
---|
0:03:10 | but also the a we use that |
---|
0:03:12 | and actually most or and many optimize to first protect |
---|
0:03:16 | for instance look at it is |
---|
0:03:18 | steve division |
---|
0:03:19 | uh which is very popular "'cause" that's some good |
---|
0:03:21 | so sort you corpora properties |
---|
0:03:23 | but it is quite a memory |
---|
0:03:24 | and swimming |
---|
0:03:25 | as it is is that you use only at ten at table |
---|
0:03:29 | uh you need at least four to by per vector |
---|
0:03:32 | or the exact |
---|
0:03:34 | to minimum |
---|
0:03:35 | and using that you have that some very good |
---|
0:03:36 | performance because i is no that that that at the station |
---|
0:03:40 | in the let's see |
---|
0:03:41 | uh |
---|
0:03:42 | a but does choice be two |
---|
0:03:44 | select a finite where is an which is a very good was and the terms of the actors |
---|
0:03:49 | right |
---|
0:03:50 | a state of so |
---|
0:03:51 | yeah is yeah agrees um |
---|
0:03:53 | on this is now a great |
---|
0:03:54 | so you you you agree |
---|
0:03:56 | but again the agrees reason |
---|
0:03:58 | cry as a lot of memory |
---|
0:04:00 | so yeah made in experiment |
---|
0:04:02 | on on need when we and vectors |
---|
0:04:04 | on choose about one hundred |
---|
0:04:06 | in me given rights |
---|
0:04:07 | index |
---|
0:04:08 | only when you don't like |
---|
0:04:11 | so it can approximate socially so that done uh |
---|
0:04:14 | you know i on two stage |
---|
0:04:17 | a the first stage is approximate search it set |
---|
0:04:21 | well you have some kind of space but training |
---|
0:04:24 | so this is a a H like a space of the shouldn't base and i can |
---|
0:04:29 | on you can |
---|
0:04:30 | for a given vector |
---|
0:04:32 | the find a set |
---|
0:04:33 | uh where is a vector or lie |
---|
0:04:36 | so you have to do this for so that the basic though |
---|
0:04:39 | a uh a flying |
---|
0:04:40 | and when you have a crappy |
---|
0:04:42 | you compute the |
---|
0:04:43 | that's key which you say |
---|
0:04:45 | on the you you would five |
---|
0:04:48 | the vectors of the same set as put control |
---|
0:04:51 | nearest neighbor |
---|
0:04:52 | so this is one as a function on in the less age |
---|
0:04:55 | you have sit the hard partition this |
---|
0:04:57 | to improve the probability |
---|
0:04:59 | to get a |
---|
0:04:59 | you |
---|
0:05:00 | nearest neighbor |
---|
0:05:02 | and then |
---|
0:05:03 | when you have some potential nearest neighbor but men mean as and are not a very good |
---|
0:05:08 | as a very far from the query vector or actually |
---|
0:05:11 | so that is why you speaker lee |
---|
0:05:13 | uh made |
---|
0:05:14 | a verification |
---|
0:05:15 | based an exact |
---|
0:05:17 | it two D |
---|
0:05:17 | this calculation |
---|
0:05:19 | know the are two uh a a a a is |
---|
0:05:22 | approximate net middle |
---|
0:05:24 | a good thing is that is it should nearest neighbor |
---|
0:05:27 | oh uh would the |
---|
0:05:29 | from the first stage |
---|
0:05:30 | you sure are that that will be wrong in good position of does exact |
---|
0:05:35 | a speculation |
---|
0:05:37 | the the problem is that for this a we don't stage |
---|
0:05:40 | we need to the script dolls |
---|
0:05:42 | on |
---|
0:05:43 | it means |
---|
0:05:44 | that we i still as to use a a huge amount of memory |
---|
0:05:49 | so for instance for one B vectors |
---|
0:05:51 | it means more than one hundred to get that we |
---|
0:05:54 | i so you have to but and this |
---|
0:05:56 | and this |
---|
0:05:56 | in this case the in back |
---|
0:05:58 | the efficiency of this scheme because |
---|
0:06:00 | in practice |
---|
0:06:01 | you have to check too many |
---|
0:06:03 | nee both there |
---|
0:06:07 | oh you cannot do it and the but it up |
---|
0:06:09 | to to not more than one |
---|
0:06:11 | vectors |
---|
0:06:12 | was |
---|
0:06:13 | what we propose a paid not is to add that the at and that if we want came is that |
---|
0:06:17 | based on the source coding |
---|
0:06:22 | be for i have to introduce a a a a a a previous work |
---|
0:06:25 | you may |
---|
0:06:26 | uh on a makes cell edge |
---|
0:06:28 | where we use |
---|
0:06:29 | um |
---|
0:06:30 | compression base |
---|
0:06:32 | approach |
---|
0:06:33 | that is we are going to represent |
---|
0:06:35 | each that base vector |
---|
0:06:37 | by a compressed |
---|
0:06:38 | presentation |
---|
0:06:39 | that is |
---|
0:06:40 | case a concise presentation |
---|
0:06:43 | on |
---|
0:06:44 | uh is |
---|
0:06:45 | done using a product a quantizer to have many |
---|
0:06:47 | prediction values |
---|
0:06:49 | available don't |
---|
0:06:50 | he a bus |
---|
0:06:51 | for |
---|
0:06:52 | a production value |
---|
0:06:54 | i is really |
---|
0:06:56 | and then to search is is seen as a distance sounds |
---|
0:06:58 | approximation problem |
---|
0:07:00 | at is |
---|
0:07:01 | uh instead of |
---|
0:07:03 | computing to just distance between X and Y |
---|
0:07:06 | you are going to approximate it |
---|
0:07:08 | base this sounds using I |
---|
0:07:10 | and of |
---|
0:07:11 | of why |
---|
0:07:12 | on that you have a bias can use to make so but E |
---|
0:07:15 | we about the by is there |
---|
0:07:17 | just the most |
---|
0:07:17 | but |
---|
0:07:19 | and |
---|
0:07:20 | on so in |
---|
0:07:21 | pressing put that C |
---|
0:07:23 | that we can make this distance |
---|
0:07:24 | estimation directly in the compressed domain |
---|
0:07:27 | do not at and compress that that to make |
---|
0:07:30 | it to so |
---|
0:07:31 | uh that |
---|
0:07:31 | power |
---|
0:07:33 | on uh you may be from yeah with |
---|
0:07:36 | uh you we |
---|
0:07:37 | uh on building this that where a a a a couldn't vectors |
---|
0:07:40 | on that |
---|
0:07:41 | to you a space |
---|
0:07:43 | to P place to it to distance by hand this sounds |
---|
0:07:46 | on |
---|
0:07:47 | using this scheme |
---|
0:07:48 | you obtain |
---|
0:07:50 | almost same efficiency |
---|
0:07:51 | the performance |
---|
0:07:53 | much better i can show you some |
---|
0:07:56 | or so we have some proved average above bounds |
---|
0:07:58 | sounds |
---|
0:07:59 | estimation people |
---|
0:08:01 | so now uh |
---|
0:08:03 | i'm looking is a on stage |
---|
0:08:05 | the re-ranking king stage knowing that the first stage was a compressed base |
---|
0:08:09 | pro |
---|
0:08:11 | a good proposed but use the first stage is that we had a compressed based they sink which means that |
---|
0:08:16 | for each that that is vector or we have an explicit a construction |
---|
0:08:21 | of Z stick to |
---|
0:08:23 | so |
---|
0:08:24 | instead of using the whole disk |
---|
0:08:25 | to for the second stage |
---|
0:08:27 | we are going to uh we find |
---|
0:08:30 | the first a construction |
---|
0:08:32 | or that the basic to well thank as the first |
---|
0:08:34 | state |
---|
0:08:36 | this is done by first computing still this your or a vector |
---|
0:08:39 | so |
---|
0:08:41 | it means |
---|
0:08:41 | this this |
---|
0:08:42 | a small vector |
---|
0:08:43 | we two |
---|
0:08:44 | oh this one |
---|
0:08:47 | and then it's spectral because we do not want to stop me because he does a i didn't mention a |
---|
0:08:51 | uh uh uh for C what be don't we |
---|
0:08:54 | eight |
---|
0:08:55 | it's does not quantized use a using the quantizer that at like to |
---|
0:08:59 | uh uh uh if it's dual uh |
---|
0:09:03 | so now means that |
---|
0:09:04 | we can approximate Y as a first approximation obtained by the first stage |
---|
0:09:09 | plus as a gone |
---|
0:09:11 | uh |
---|
0:09:11 | of if five don't |
---|
0:09:13 | the could yeah |
---|
0:09:14 | so that is on could be to this dual vector |
---|
0:09:16 | which improves the initial |
---|
0:09:18 | the estimate |
---|
0:09:19 | both as or a construction but also for the distance calculation |
---|
0:09:24 | and we can uh |
---|
0:09:26 | re |
---|
0:09:26 | we can |
---|
0:09:28 | right that |
---|
0:09:28 | uh between precision and memory |
---|
0:09:30 | by |
---|
0:09:31 | is the amount of by to all going to D to this |
---|
0:09:35 | for contains so |
---|
0:09:37 | so it's parameters and prior on can be eight bytes |
---|
0:09:40 | that |
---|
0:09:41 | which just a small and the number of bytes |
---|
0:09:43 | to "'cause" on |
---|
0:09:44 | so a which in not vector |
---|
0:09:47 | that's good that i was them which is |
---|
0:09:48 | quite quite simple |
---|
0:09:50 | so |
---|
0:09:51 | we yeah that that this vector |
---|
0:09:54 | why |
---|
0:09:55 | the first approximation made by the first stage |
---|
0:09:58 | E |
---|
0:09:59 | a consists in a pleasant thing it that the code that |
---|
0:10:03 | which can be seen in the old net clean space as Q of why |
---|
0:10:08 | is the first stage well bring to compress the curvy |
---|
0:10:10 | we suppose of possible to Y |
---|
0:10:13 | on sale a shot is |
---|
0:10:14 | oh vectors |
---|
0:10:15 | so once that we want to fine |
---|
0:10:18 | if Y to select T as a put concerned roast neighbour or |
---|
0:10:21 | is you are going to explicitly to construct |
---|
0:10:24 | this |
---|
0:10:25 | improve estimate |
---|
0:10:26 | that is we been to uh we find |
---|
0:10:29 | uh why by using so was really taught like this |
---|
0:10:33 | so we have a hats |
---|
0:10:34 | and then is a new distance to i will be |
---|
0:10:37 | are there in instead of this C |
---|
0:10:39 | which is a better approximation |
---|
0:10:41 | uh is than |
---|
0:10:42 | the distance between back uh |
---|
0:10:44 | so this is a better a use this |
---|
0:10:46 | for this |
---|
0:10:47 | to this |
---|
0:10:51 | so is it see some such results |
---|
0:10:53 | in one billion vectors |
---|
0:10:56 | the |
---|
0:10:57 | in this case we is used for the first stage |
---|
0:10:59 | eight a of vectors |
---|
0:11:01 | the first one |
---|
0:11:02 | that wasn't a of the the the cost |
---|
0:11:06 | on just use this performance is performance |
---|
0:11:09 | yeah i i shows the wrong of |
---|
0:11:11 | so one else need or when i don't get the right |
---|
0:11:14 | over a a large amount of query |
---|
0:11:17 | oh |
---|
0:11:17 | the probably for we no stable is one in the first position |
---|
0:11:21 | in the text first position in the hundred as position |
---|
0:11:25 | you have to sing that the wrong can be what up to one billion vectors |
---|
0:11:28 | to or that you can see that the first approach |
---|
0:11:31 | is a better |
---|
0:11:32 | i could be to wrong |
---|
0:11:34 | so the neighbor in the first thousand position but |
---|
0:11:37 | that's not |
---|
0:11:39 | so you make and we don't king using one the eight by |
---|
0:11:42 | for of it you as usual uh could could cause a and want to |
---|
0:11:46 | we set |
---|
0:11:46 | you get a very good improvements |
---|
0:11:48 | i then you can file |
---|
0:11:49 | i sixteen bytes |
---|
0:11:51 | but |
---|
0:11:52 | and we converge |
---|
0:11:53 | the using a uh one of twenty eight byte |
---|
0:11:56 | not perform ones there |
---|
0:11:57 | which means that |
---|
0:11:58 | the first |
---|
0:11:59 | a a stable always one and first position |
---|
0:12:05 | okay on |
---|
0:12:06 | uh i have say that so we don't state as a unit cost |
---|
0:12:09 | which is almost negligible compared to the first |
---|
0:12:13 | on as a clone second so |
---|
0:12:15 | a means just mention |
---|
0:12:17 | oh kind of ms so that we can increase in new stable |
---|
0:12:20 | uh uh we see less and one minutes again |
---|
0:12:22 | oh a a to two hundred means going if you want to be sure that you we gets a nearest |
---|
0:12:26 | stable |
---|
0:12:27 | one billion vector |
---|
0:12:28 | a very five |
---|
0:12:30 | uh so |
---|
0:12:31 | you to of the old king is that we you can see |
---|
0:12:34 | uh i'm not being that was a it |
---|
0:12:36 | time |
---|
0:12:37 | but is better to use a less i as the first stage |
---|
0:12:40 | on more of the circle because |
---|
0:12:42 | you will have a i efficiency |
---|
0:12:45 | we have that separation for the first stage |
---|
0:12:47 | uh but |
---|
0:12:48 | comparable precision or or in fact |
---|
0:12:51 | if an improved precision using the we don't king |
---|
0:12:54 | a stage based on on uh |
---|
0:12:55 | source couldn't or fine |
---|
0:12:58 | and i would like to mention the before can might talk |
---|
0:13:01 | that yes but then nine uh be vector that the set of one billion |
---|
0:13:06 | big |
---|
0:13:07 | on the reason we have don't this |
---|
0:13:09 | is because |
---|
0:13:10 | as many papers thus an approximate cell |
---|
0:13:12 | is that makes an evaluation on one billion vectors |
---|
0:13:15 | and one in fact uh i think i have shows is the beginning is that's actual sites get a petition |
---|
0:13:20 | we need to on that one big in fact on that one you |
---|
0:13:23 | so if you makes make three months |
---|
0:13:25 | these thirty days and on the here uh |
---|
0:13:27 | where are but the set |
---|
0:13:28 | and this is one for which |
---|
0:13:30 | uh we have uh |
---|
0:13:32 | we compute that |
---|
0:13:34 | so uh extracted and thousand prairie |
---|
0:13:37 | we this and because for long |
---|
0:13:39 | on you they completely the exact |
---|
0:13:41 | nearest neighbor |
---|
0:13:42 | that is for each we |
---|
0:13:43 | you have computed support supports distance is |
---|
0:13:45 | a vectors |
---|
0:13:47 | on we give |
---|
0:13:48 | zero on and i wrong of uh as a |
---|
0:13:51 | a true |
---|
0:13:52 | uh a a thousand hz an on the corresponding distance |
---|
0:13:55 | C |
---|
0:13:55 | case you want |
---|
0:13:56 | we |
---|
0:13:57 | but make runs |
---|
0:13:58 | how |
---|
0:14:00 | to conclude my two |
---|
0:14:01 | we have proposed the |
---|
0:14:03 | as could base you don't king approach |
---|
0:14:05 | that the vote using a whole skipped also you can see |
---|
0:14:08 | it |
---|
0:14:08 | a to a the memory for a comedy yeah the server |
---|
0:14:12 | in which improves with shades |
---|
0:14:14 | yeah |
---|
0:14:14 | a trade off between a efficiency and pretty |
---|
0:14:17 | for a fixed more is that |
---|
0:14:19 | you have a is uh the to at a point of the vector for evaluation |
---|
0:14:23 | of approximate search |
---|
0:14:25 | a not so i have to munch that we have but the method a cage uh a a nine |
---|
0:14:30 | a for compression based me sets that |
---|
0:14:32 | produce |
---|
0:14:33 | as a result of uh |
---|
0:14:35 | the and i where is an and so |
---|
0:14:37 | source but things that i have a mention |
---|
0:14:40 | a for you |
---|
0:14:47 | a a question |
---|
0:14:57 | i |
---|
0:14:57 | i have very short uh question how deep and is this a technique on using set per se |
---|
0:15:04 | so |
---|
0:15:05 | a a do we have to use it with this technique are uh no so we we have this this |
---|
0:15:10 | is this so don't the us to a the loss and we do this skip those on the back of |
---|
0:15:15 | all descriptors |
---|
0:15:17 | but and you and i think that they can be you can |
---|
0:15:20 | done this |
---|
0:15:25 | and i have a question |
---|
0:15:29 | i i i i i have one more question myself actually |
---|
0:15:32 | and |
---|
0:15:33 | you do this iteration with one |
---|
0:15:36 | and what prevents you from a to reading again and we can it for them for further and you |
---|
0:15:42 | given T conversations that were that one if you make the quantisation of zero very small yeah i can go |
---|
0:15:47 | to zero uh for so if were of course on it is a good question because of uh we |
---|
0:15:51 | out to optimize |
---|
0:15:52 | the first stage the second save on i'd use that stage can think of right |
---|
0:15:56 | stores and that some kind of words are sufficient to asian actually so we have that the there's two up |
---|
0:16:02 | and to this stage |
---|
0:16:03 | so that is |
---|
0:16:04 | would be a good a |
---|
0:16:06 | it's nice |
---|
0:16:07 | or several let us on try out many neighbours how many back to give to and yeah yeah |
---|
0:16:13 | but you you you get you have that |
---|
0:16:16 | i remember my first course and quantization and and this per some she make is that the quantization is you |
---|
0:16:21 | know for |
---|
0:16:22 | with a bin |
---|
0:16:23 | sorry i i remember my first course score quantization and you know the assumption you make |
---|
0:16:27 | it that you know that that the air the quantisation here use uniform within a bin |
---|
0:16:32 | um |
---|
0:16:34 | so that there's |
---|
0:16:35 | you know you know form here |
---|
0:16:37 | and so they can be evenly distributed i you know points in the bin |
---|
0:16:40 | and so i'm wondering is if you do one or more levels of quantization |
---|
0:16:45 | wouldn't that make |
---|
0:16:46 | um a very hard to quantisation because there's is a very little structure left to you know it's |
---|
0:16:50 | take advantage of |
---|
0:16:51 | yes |
---|
0:16:52 | uh |
---|
0:16:54 | i think that's true but and you at some point when you use and are very fine can ties also |
---|
0:16:58 | just as the the first |
---|
0:17:00 | the yeah |
---|
0:17:01 | side is correlated for the eigenvalue value |
---|
0:17:04 | like compression |
---|
0:17:05 | yes some structure |
---|
0:17:06 | for |
---|
0:17:07 | a big intensity |
---|
0:17:08 | when you we find as the and we when you a noise anyway |
---|
0:17:11 | we have the same |
---|
0:17:12 | uh |
---|
0:17:13 | a party |
---|
0:17:14 | i think that uh as in compression that is |
---|
0:17:16 | as the on which you are we code |
---|
0:17:18 | uh |
---|
0:17:19 | a most one them uh |
---|
0:17:21 | so you from side or |
---|
0:17:22 | but the program from yeah it's |
---|
0:17:25 | just just got but it is a problem when you have a high dimensional data set |
---|
0:17:28 | because all points weekly descent |
---|
0:17:30 | one one uniform made you yeah you you mean you a this so a to compared to |
---|
0:17:35 | and you could do |
---|
0:17:36 | but the |
---|
0:17:37 | if by this |
---|
0:17:39 | improve |
---|
0:17:43 | i think |
---|
0:17:45 | i |
---|