0:00:15 | a good afternoon ever about this a try to be as brief as possible so |
---|
0:00:19 | this is the type of the presentation my name and the name of my collaborators |
---|
0:00:24 | contributed some parts of this presentation so let's start from the outline so basically i'll |
---|
0:00:31 | be talking about certain problems of an image retrieval using that the cup in general |
---|
0:00:38 | the concept of keypoints and the then i highlight some disadvantages of typical keypoint based |
---|
0:00:45 | approaches that new features based on keypoints as well introduced and then a brief overview |
---|
0:00:52 | gradually build several algorithms combining those two those new features with traditional sift descriptors and |
---|
0:01:00 | of course everything will be supported by some preliminary experimental verification so basically the problem |
---|
0:01:08 | of detection of near duplicate image fragments are image patches is quite straightforward if we |
---|
0:01:14 | have two images by not be interested in their in determining whether they are similar |
---|
0:01:21 | or nearly identical images but we might be interested whether those images contain any unspecified |
---|
0:01:28 | and an unspecified number similar fragments similar patches so these are examples of images which |
---|
0:01:34 | are generally different but which contain some similarly looking fragments this these similarities are usually |
---|
0:01:41 | represented by a transformation affine transformation uh denote a generally accepted model so another example |
---|
0:01:51 | of completely different images which nevertheless contain some fragments those patches are also to but |
---|
0:01:57 | not always represented just a similar or identical objects placed on different backgrounds |
---|
0:02:05 | and in general we expect the results something like that i mean we have those |
---|
0:02:10 | buttons all kinds of the errors and then we a we have we expect somehow |
---|
0:02:16 | i'm showing that these are identical fragments identical need nearly identical patches even though the |
---|
0:02:21 | images are different or another example of those two fragments are similar are those two |
---|
0:02:26 | but are similar accidentally be represent more of the same object or another example the |
---|
0:02:32 | same idea and one more over here a these images are different but they share |
---|
0:02:38 | some nearly identical similar patches so of course uh the concept of may keypoint-based matching |
---|
0:02:45 | case a very well known as very well established at the very powerful tool but |
---|
0:02:51 | in the context also the detection of near duplicate patches near duplicate fragments it has |
---|
0:02:58 | several basically two major disadvantages the first one is that individual keypoint matches even if |
---|
0:03:05 | they correctly represent local similarity is a very seldom correct in the context in the |
---|
0:03:11 | wider contacts any local or global complex and these are just a very simple example |
---|
0:03:17 | shown in two images which are though different and it contains a small fragment and |
---|
0:03:22 | these are a keypoint matches if we use the most restrictive technique one-to-one keypoint correspondences |
---|
0:03:29 | based on the mutual nearest neighbor values exhibitions and harris affine keypoints so as you |
---|
0:03:36 | see it's high it's very difficult to identify which fragment activity you're nearly similar or |
---|
0:03:41 | nearly identical or another fragment which is even more confusing because now we are using |
---|
0:03:48 | many-to-many correspondence is based on the same sift words and their work well the number |
---|
0:03:54 | of correspondences if you order and some of them actually corresponds to a similar fragments |
---|
0:03:59 | existing these two images so it is have some statistical examples of the small data |
---|
0:04:06 | because we're using for experiments shopping of data if we have images which contain just |
---|
0:04:12 | fragments which are similar the percentage of correct matches of all the matches between individual |
---|
0:04:18 | keypoints are very small actually a it renders all these methods use less in terms |
---|
0:04:25 | of bit direct to use it of keypoint correspondences as a tool for that nearly |
---|
0:04:30 | near duplicate fragments near duplicate patches so what we generally do we is supported this |
---|
0:04:37 | technique with some second step algorithm so which are jointly called do you the constraint |
---|
0:04:44 | can agree configuration constraints verification we can use different techniques like to run start which |
---|
0:04:50 | doesn't work you for their too few correct correspondences some variance of runs like prosodic |
---|
0:04:56 | geometric hashing hough transform the large vocabulary somewhere that identical words are used as seeds |
---|
0:05:03 | for the future matching and so on and so one but into a all these |
---|
0:05:07 | techniques require these well i'd say post processing when we matching pair of images so |
---|
0:05:12 | which means they are a certain limits of the size of the database to be |
---|
0:05:17 | such techniques can be used so the |
---|
0:05:21 | but that's why that's why we have proposed a new features which are still based |
---|
0:05:28 | on the keypoints but using these features we can achieve much higher the uh i |
---|
0:05:35 | mean the reliability of the techniques which are based on the straightforward feature matching no |
---|
0:05:41 | other verification of the geometry of a pulse trains and so on and so on |
---|
0:05:46 | so basically these features are term these features are named term features in turn comes |
---|
0:05:53 | from the tuples of ellipses represented by moments will be later explain why we use |
---|
0:06:00 | these names there is |
---|
0:06:02 | some similar right yes in the past so these are the in my opinion the |
---|
0:06:06 | best two examples of such ideas which nevertheless were not fully developed in the in |
---|
0:06:11 | the direction but uh |
---|
0:06:15 | yeah the features we propose are based on the |
---|
0:06:19 | triplets or supplementarily pairs of keypoints in this and since the that we propose descriptors |
---|
0:06:27 | for those features that you could have descriptors we can convert descriptors into the vocabularies |
---|
0:06:32 | of the visual words so in the future we can just imagine those features like |
---|
0:06:36 | matching the descriptor for by matching visual words and of course not because we assume |
---|
0:06:42 | that this might be distorted by affine transformations are using a affine invariant detectors which |
---|
0:06:48 | a return elliptical keypoints as our starting point actually we are using two categories of |
---|
0:06:56 | it detectors we are using a harris affine and we are using and seven very |
---|
0:07:01 | popular so starting from the term three features so if we have a and alex |
---|
0:07:08 | which could be a key point and an external point P one we can actually |
---|
0:07:12 | find these two points are a zero one eight one zero one B based on |
---|
0:07:18 | the direction of the tandem flight and intersection point |
---|
0:07:25 | is there any |
---|
0:07:26 | i mean i would be i and then you create so we have these two |
---|
0:07:31 | points are zero one a can be uniquely defined by the ellipse and by this |
---|
0:07:36 | point P one if we have another point we have another pair of such points |
---|
0:07:42 | he uniquely defined by this external point and by the al it set so that |
---|
0:07:48 | we can actually set for such points a if we're a two other external point |
---|
0:07:54 | are available for a given talent and then by using two of these four points |
---|
0:07:59 | and by using the point where the corresponding line intersects the objects we can you |
---|
0:08:05 | need to create an |
---|
0:08:07 | trapezoid which is defined by the L it's itself and the by these two external |
---|
0:08:12 | points in other words the geometry of this trapezoid is defined by the ellipse itself |
---|
0:08:18 | in the context of to external points is uniquely defined so then of course fortunately |
---|
0:08:25 | the clean out about this is quite obvious assumption so then if we have a |
---|
0:08:30 | if we have three ellipses for each of them if we can see that the |
---|
0:08:34 | centers of the other two and next are not points we can build such a |
---|
0:08:39 | trapezoid and these three trapezoids are considered term three feature so this is the feature |
---|
0:08:44 | which represents these three ellipses but in the context of the other two ellipses of |
---|
0:08:50 | course these points are sort of these features are change and the geometry of these |
---|
0:08:57 | features changes will corresponding the geometry of that of the additive so if we have |
---|
0:09:02 | a transformation linear or a affine transformation of that of the area contained in these |
---|
0:09:08 | three ellipses the shapes of these trapezoids change the change corresponding in other words we |
---|
0:09:16 | can relatively easily identified three ellipses which are the which are transformed or you will |
---|
0:09:23 | buy it fine madden but by an affine mapping just by using and the shape |
---|
0:09:28 | descriptors which are you invariant under such transformations and be set you are of course |
---|
0:09:33 | that these trapezoid it's so basically |
---|
0:09:38 | this is that on three feature and the term two feature which is based only |
---|
0:09:43 | on two ellipses again we use is three quadrilaterals which are uniquely defined by these |
---|
0:09:50 | three ellipses and again they change it they change covariantly with the ellipses you the |
---|
0:09:56 | edges are in map by from affine transformation but as i said these features have |
---|
0:10:02 | only uh are used only in the supplemental role in the future okay then |
---|
0:10:09 | we define the descriptors for these features and because the shapes are very simple we |
---|
0:10:13 | are also using a very simple descriptor the descriptor to giles that these complex affine |
---|
0:10:19 | moment invariants which is even or their over that are based on a central moments |
---|
0:10:25 | of them of the second order it divided by the central by the moment of |
---|
0:10:30 | the zero holding by the area and then as a descriptor also that all this |
---|
0:10:36 | teacher we use that seven values of this invariant which are gradually calculated for the |
---|
0:10:42 | first of these shapes for the second one for the third one for two of |
---|
0:10:47 | them another pair of them for the third rbm and finally for all of them |
---|
0:10:53 | is a very simple seven dimensional descriptor actually it for that three-dimensional a descriptor of |
---|
0:11:01 | term two features of our or also using just the just this battery and calculated |
---|
0:11:06 | for those three contribute be quite a lot less then it turns out that |
---|
0:11:14 | these |
---|
0:11:16 | features are obviously very numerous in a typical image if we if we just take |
---|
0:11:21 | any triplet or any pair of keypoints effective in an image so we might get |
---|
0:11:26 | millions of features within a single image which is of course unacceptable so we deliberately |
---|
0:11:32 | made the number of these features by using two steps the first step is of |
---|
0:11:36 | course to reduce the number of keypoints themselves so we somehow and we tried decided |
---|
0:11:42 | to limit the number of keypoints that are three hundred although using somehow the best |
---|
0:11:47 | three hundred the most prominent three hundred keypoints the and then we are building of |
---|
0:11:53 | those features using only a certain neighborhood of those of those not more than three |
---|
0:11:58 | hundred keypoints and we are using we are using it is the neighbourhoods we do |
---|
0:12:04 | not just traditional geometric make the records about the neighbourhood are determined by the ellipses |
---|
0:12:08 | themselves in other words we assume that either two ellipses which are very close and |
---|
0:12:14 | their size is uh a very large compared to the distance most probably be represented |
---|
0:12:20 | more or less the same visual contents so we reject something like that in the |
---|
0:12:24 | ellipses are small compared to the distance between them most probably these visual content are |
---|
0:12:29 | not so related or rather unrelated so we also don't really consider such keypoints in |
---|
0:12:34 | other words we use all the ellipses or keypoints which are within a certain distance |
---|
0:12:39 | and that this those is determined by the size of the minor and major axis |
---|
0:12:44 | of both and it's as though it should not more than half of the longer |
---|
0:12:49 | minor axis and the not more than two of the short verify off all the |
---|
0:12:55 | major axis so that would be arbitrarily is defined slightly different about this is the |
---|
0:13:01 | idea and other these the sounds and additionally we exclude triangles of keypoints which are |
---|
0:13:06 | just too narrow if one of the angles is less than fifteen degrees so under |
---|
0:13:11 | such establishes we get reasonable numbers of keypoint is sort of features and again these |
---|
0:13:17 | are just a statistical i results from our database all a with the images where |
---|
0:13:24 | the number of keypoints is limited to more or less three hundred we actually have |
---|
0:13:30 | tolerable large numbers all goes to it term two and term three features up to |
---|
0:13:36 | sixteen thousand which is which is not that bad |
---|
0:13:40 | and that the important point in using these features is that is statistically independent of |
---|
0:13:47 | the individual dimensions we use over thirty million term features from over one thousand images |
---|
0:13:55 | and we found that the correlation between individual dimensions is very no it's practically negligible |
---|
0:14:02 | so which means that we assume that those dimensions are just independent and then we |
---|
0:14:08 | build the visual vocabulary just by |
---|
0:14:13 | dividing each dimension into areas of the same probabilities two three four five and then |
---|
0:14:20 | the number of for each word is just corresponding to the cartesian product of such |
---|
0:14:25 | a decomposition so which means that we have just to the power of seven three |
---|
0:14:29 | to the power of seven for power of seven and so all and so on |
---|
0:14:33 | works for T three description of term three features or two to the power of |
---|
0:14:38 | three four and so all the four C two description so basically that is a |
---|
0:14:44 | that mathematical background and now the algorithms and the evaluation we have used in a |
---|
0:14:52 | relatively small database which is available over there but the beauty of the database is |
---|
0:14:58 | that it contains newly duplicate images and all the images are manually outlined and which |
---|
0:15:06 | means that we can uh automatically determine whether two features or to keypoints are properly |
---|
0:15:13 | matched of course there are some mistakes because it might be that they are matching |
---|
0:15:18 | not the same fragment of all of the same object or there should match similarly |
---|
0:15:23 | looking fragment of different objects and the way this is a pretty to be a |
---|
0:15:28 | reliable verification reliable grounds tools and then |
---|
0:15:33 | we have been trying we will be trying to a three |
---|
0:15:38 | to measure the performances using both traditional keypoints and goals you term two and three |
---|
0:15:46 | there are three features in the foreseeable to which was used a as a representation |
---|
0:15:52 | for individual is used as a small vocabulary of five hundred words may be too |
---|
0:15:58 | small but it and it turns out that it's not really that bad and |
---|
0:16:05 | and these are the results it's a first we just matched individual keypoint using one-to-one |
---|
0:16:11 | no term to know term three then a again individual keypoints using just a many-to-many |
---|
0:16:19 | the same word then methods were just individual term two and term three features are |
---|
0:16:25 | matched using a than using either a one-to-one or many-to-many |
---|
0:16:33 | and it turns out that all these results are using this |
---|
0:16:39 | the number of correct matches when we use individual keypoints are we use individual term |
---|
0:16:44 | for term three features are basically at the level of statistical error there's no there's |
---|
0:16:49 | no use of such a big how weather easily combine term two or three we |
---|
0:16:57 | keypoints it keypoint description which we is that we matched features geometrically term parental three |
---|
0:17:04 | only if they contribute the keypoints have the same words the results are much better |
---|
0:17:09 | and |
---|
0:17:11 | these are exemplar is not results for this database is used that are very good |
---|
0:17:16 | the problem is that the number of the matches is very small compared to the |
---|
0:17:21 | number of uh compared to the to the number of matching images because they're over |
---|
0:17:26 | five hundred images which contain similar fragments but altogether we have around one thousand five |
---|
0:17:32 | hundred two thousand keypoint found in there we use are very features found in the |
---|
0:17:38 | whole database and it turns out that all of them are concentrated in just fifteen |
---|
0:17:43 | twenty images so which means that the method is quite good but at its purest |
---|
0:17:49 | it's all that is that of all we used to just words for both terrible |
---|
0:17:55 | features there are three in particular and then works for the contributing sets then the |
---|
0:18:00 | results are better in terms of the number of sound correspondences about the percentage of |
---|
0:18:07 | correct matches is not so good it's too much better than previously it's about twenty |
---|
0:18:11 | percent but it's still |
---|
0:18:14 | not enough and eventuality the ultimate method combines |
---|
0:18:19 | both were four and three which me is sort of forty three term three which |
---|
0:18:24 | means we take triplets with five we can we match works for T three then |
---|
0:18:29 | we match were for their tool for each pair of keypoints within the triplet and |
---|
0:18:38 | finally we must we match also a individual sift works for the comparability a report |
---|
0:18:46 | the contributing keypoint and then especially if we use harris having a the results are |
---|
0:18:52 | quite good uniform so are we still have not too many of them but for |
---|
0:18:56 | harris affine they are quite many of them and the current level of phone correctness |
---|
0:19:00 | is very high ninety three percent of the reconstructed of the retrieved features are matched |
---|
0:19:06 | properly which means they actually belong to the same patch similarly looking past of the |
---|
0:19:13 | same object it is also and then recall the results are also not that bad |
---|
0:19:19 | the effective recall is about eighty percent a which is also not that bad especially |
---|
0:19:24 | that some of these objects are really learn so there are not many keypoint there |
---|
0:19:29 | in |
---|
0:19:31 | that's it these are just exemplary results returned by a that it by this method |
---|
0:19:37 | of course to the ellipses that show the outline because the individual keypoints are not |
---|
0:19:42 | so well as seen that there is you see the results are really quite board |
---|
0:19:46 | and all of them are achieved by a simple match of individual features no verification |
---|
0:19:52 | of configuration constraints not be like that just matching individual features |
---|
0:19:59 | yeah that some of these are from the database it somehow from this and is |
---|
0:20:05 | a data base um are just from i don't databases maybe a available databases have |
---|
0:20:11 | been used that also some semantically incorrect results but if you look at about eight |
---|
0:20:17 | dollars way that that's right |
---|
0:20:20 | yeah or in there |
---|
0:20:22 | but this example so that basically eight and so summary these advantages |
---|
0:20:31 | and disadvantages we still |
---|
0:20:35 | have to reduce the number of keypoints are not perfectly yeah having the battery on |
---|
0:20:40 | that was one of the conclusions and eventually we have to as an ultimate solution |
---|
0:20:47 | you want to incorporate very similar data into descriptions of individual keypoint so which means |
---|
0:20:53 | that we can identify similarly looking fragments of images but don't by matching descriptions of |
---|
0:20:59 | three calculated descriptions of individual keypoints rather than features comparing so we keep that said |
---|
0:21:07 | thank you very much sorta for being played for one minute |
---|
0:21:12 | okay |
---|
0:21:16 | well as i would have to be fast |
---|
0:21:39 | yes |
---|
0:21:42 | no not that you of the two older |
---|
0:21:45 | yes oh yeah |
---|
0:21:49 | no but it because basically they're all the three of them so we cannot incorporate |
---|
0:21:53 | more and the identity is that you each of them contain is represented by individual |
---|
0:22:00 | words so which is that we can just matt sentences of a few words so |
---|
0:22:05 | that's why this is the number of data we combine three of them |
---|
0:22:11 | yeah |
---|
0:22:13 | yes |
---|
0:22:18 | yes |
---|
0:22:21 | i |
---|
0:22:26 | yeah well basically i'm not saying that i'm identifying the outline of the patch i |
---|
0:22:31 | and identify and detecting the presence of the patch so even if it's a single |
---|
0:22:35 | keypoint or a single feature it shows that there is a patch there and it |
---|
0:22:41 | generally is the outline of the patch it's a different story but |
---|
0:22:47 | now we can we can at least know that there is a patch there with |
---|
0:22:53 | a high level of confidence |
---|
0:23:05 | uh_huh |
---|
0:23:12 | that |
---|
0:23:27 | yeah |
---|
0:23:29 | i fully agree with you the problem is that if you don't if you're just |
---|
0:23:33 | looking in that unknown unpredictable database for some preliminary similar is you have to start |
---|
0:23:39 | from the method which doesn't assume and think about the content of the |
---|
0:23:43 | and this is basically the most perfect |
---|
0:23:53 | sorry if the images containing |
---|
0:23:56 | yeah it's not a problem then we have a lot of the that was one |
---|
0:24:00 | example of two bottles of beer over here and two bits are two kinds of |
---|
0:24:04 | your over europe to cancel it be over there so basically we found that for |
---|
0:24:09 | pairs of patterns there's just this and that so which means that in that case |
---|
0:24:15 | we might have a lot of pairs of patches but eventually it's almost all that |
---|
0:24:20 | is it a very good part of their to either two men the repetitions of |
---|
0:24:24 | the pattern of course it doesn't make too much sense about you don't of two |
---|
0:24:27 | or three or four of them works there was no problem |
---|
0:24:39 | okay well i didn't have that i didn't have time to mention that it one |
---|
0:24:43 | basically depends on the application so it is very naive matlab bodies a implementation we |
---|
0:24:50 | are able to match hundreds of images within individual second |
---|
0:24:55 | yeah of course the random images so we have not completed that in a starving |
---|
0:25:02 | on and then C plus implementation and or more sophisticated techniques of a native a |
---|
0:25:10 | organising the database a but i did you can be really is |
---|
0:25:14 | i would be and the meeting |
---|
0:25:19 | this |
---|