a good afternoon ever about this a try to be as brief as possible so
this is the type of the presentation my name and the name of my collaborators
contributed some parts of this presentation so let's start from the outline so basically i'll
be talking about certain problems of an image retrieval using that the cup in general
the concept of keypoints and the then i highlight some disadvantages of typical keypoint based
approaches that new features based on keypoints as well introduced and then a brief overview
gradually build several algorithms combining those two those new features with traditional sift descriptors and
of course everything will be supported by some preliminary experimental verification so basically the problem
of detection of near duplicate image fragments are image patches is quite straightforward if we
have two images by not be interested in their in determining whether they are similar
or nearly identical images but we might be interested whether those images contain any unspecified
and an unspecified number similar fragments similar patches so these are examples of images which
are generally different but which contain some similarly looking fragments this these similarities are usually
represented by a transformation affine transformation uh denote a generally accepted model so another example
of completely different images which nevertheless contain some fragments those patches are also to but
not always represented just a similar or identical objects placed on different backgrounds
and in general we expect the results something like that i mean we have those
buttons all kinds of the errors and then we a we have we expect somehow
i'm showing that these are identical fragments identical need nearly identical patches even though the
images are different or another example of those two fragments are similar are those two
but are similar accidentally be represent more of the same object or another example the
same idea and one more over here a these images are different but they share
some nearly identical similar patches so of course uh the concept of may keypoint-based matching
case a very well known as very well established at the very powerful tool but
in the context also the detection of near duplicate patches near duplicate fragments it has
several basically two major disadvantages the first one is that individual keypoint matches even if
they correctly represent local similarity is a very seldom correct in the context in the
wider contacts any local or global complex and these are just a very simple example
shown in two images which are though different and it contains a small fragment and
these are a keypoint matches if we use the most restrictive technique one-to-one keypoint correspondences
based on the mutual nearest neighbor values exhibitions and harris affine keypoints so as you
see it's high it's very difficult to identify which fragment activity you're nearly similar or
nearly identical or another fragment which is even more confusing because now we are using
many-to-many correspondence is based on the same sift words and their work well the number
of correspondences if you order and some of them actually corresponds to a similar fragments
existing these two images so it is have some statistical examples of the small data
because we're using for experiments shopping of data if we have images which contain just
fragments which are similar the percentage of correct matches of all the matches between individual
keypoints are very small actually a it renders all these methods use less in terms
of bit direct to use it of keypoint correspondences as a tool for that nearly
near duplicate fragments near duplicate patches so what we generally do we is supported this
technique with some second step algorithm so which are jointly called do you the constraint
can agree configuration constraints verification we can use different techniques like to run start which
doesn't work you for their too few correct correspondences some variance of runs like prosodic
geometric hashing hough transform the large vocabulary somewhere that identical words are used as seeds
for the future matching and so on and so one but into a all these
techniques require these well i'd say post processing when we matching pair of images so
which means they are a certain limits of the size of the database to be
such techniques can be used so the
but that's why that's why we have proposed a new features which are still based
on the keypoints but using these features we can achieve much higher the uh i
mean the reliability of the techniques which are based on the straightforward feature matching no
other verification of the geometry of a pulse trains and so on and so on
so basically these features are term these features are named term features in turn comes
from the tuples of ellipses represented by moments will be later explain why we use
these names there is
some similar right yes in the past so these are the in my opinion the
best two examples of such ideas which nevertheless were not fully developed in the in
the direction but uh
yeah the features we propose are based on the
triplets or supplementarily pairs of keypoints in this and since the that we propose descriptors
for those features that you could have descriptors we can convert descriptors into the vocabularies
of the visual words so in the future we can just imagine those features like
matching the descriptor for by matching visual words and of course not because we assume
that this might be distorted by affine transformations are using a affine invariant detectors which
a return elliptical keypoints as our starting point actually we are using two categories of
it detectors we are using a harris affine and we are using and seven very
popular so starting from the term three features so if we have a and alex
which could be a key point and an external point P one we can actually
find these two points are a zero one eight one zero one B based on
the direction of the tandem flight and intersection point
is there any
i mean i would be i and then you create so we have these two
points are zero one a can be uniquely defined by the ellipse and by this
point P one if we have another point we have another pair of such points
he uniquely defined by this external point and by the al it set so that
we can actually set for such points a if we're a two other external point
are available for a given talent and then by using two of these four points
and by using the point where the corresponding line intersects the objects we can you
need to create an
trapezoid which is defined by the L it's itself and the by these two external
points in other words the geometry of this trapezoid is defined by the ellipse itself
in the context of to external points is uniquely defined so then of course fortunately
the clean out about this is quite obvious assumption so then if we have a
if we have three ellipses for each of them if we can see that the
centers of the other two and next are not points we can build such a
trapezoid and these three trapezoids are considered term three feature so this is the feature
which represents these three ellipses but in the context of the other two ellipses of
course these points are sort of these features are change and the geometry of these
features changes will corresponding the geometry of that of the additive so if we have
a transformation linear or a affine transformation of that of the area contained in these
three ellipses the shapes of these trapezoids change the change corresponding in other words we
can relatively easily identified three ellipses which are the which are transformed or you will
buy it fine madden but by an affine mapping just by using and the shape
descriptors which are you invariant under such transformations and be set you are of course
that these trapezoid it's so basically
this is that on three feature and the term two feature which is based only
on two ellipses again we use is three quadrilaterals which are uniquely defined by these
three ellipses and again they change it they change covariantly with the ellipses you the
edges are in map by from affine transformation but as i said these features have
only uh are used only in the supplemental role in the future okay then
we define the descriptors for these features and because the shapes are very simple we
are also using a very simple descriptor the descriptor to giles that these complex affine
moment invariants which is even or their over that are based on a central moments
of them of the second order it divided by the central by the moment of
the zero holding by the area and then as a descriptor also that all this
teacher we use that seven values of this invariant which are gradually calculated for the
first of these shapes for the second one for the third one for two of
them another pair of them for the third rbm and finally for all of them
is a very simple seven dimensional descriptor actually it for that three-dimensional a descriptor of
term two features of our or also using just the just this battery and calculated
for those three contribute be quite a lot less then it turns out that
these
features are obviously very numerous in a typical image if we if we just take
any triplet or any pair of keypoints effective in an image so we might get
millions of features within a single image which is of course unacceptable so we deliberately
made the number of these features by using two steps the first step is of
course to reduce the number of keypoints themselves so we somehow and we tried decided
to limit the number of keypoints that are three hundred although using somehow the best
three hundred the most prominent three hundred keypoints the and then we are building of
those features using only a certain neighborhood of those of those not more than three
hundred keypoints and we are using we are using it is the neighbourhoods we do
not just traditional geometric make the records about the neighbourhood are determined by the ellipses
themselves in other words we assume that either two ellipses which are very close and
their size is uh a very large compared to the distance most probably be represented
more or less the same visual contents so we reject something like that in the
ellipses are small compared to the distance between them most probably these visual content are
not so related or rather unrelated so we also don't really consider such keypoints in
other words we use all the ellipses or keypoints which are within a certain distance
and that this those is determined by the size of the minor and major axis
of both and it's as though it should not more than half of the longer
minor axis and the not more than two of the short verify off all the
major axis so that would be arbitrarily is defined slightly different about this is the
idea and other these the sounds and additionally we exclude triangles of keypoints which are
just too narrow if one of the angles is less than fifteen degrees so under
such establishes we get reasonable numbers of keypoint is sort of features and again these
are just a statistical i results from our database all a with the images where
the number of keypoints is limited to more or less three hundred we actually have
tolerable large numbers all goes to it term two and term three features up to
sixteen thousand which is which is not that bad
and that the important point in using these features is that is statistically independent of
the individual dimensions we use over thirty million term features from over one thousand images
and we found that the correlation between individual dimensions is very no it's practically negligible
so which means that we assume that those dimensions are just independent and then we
build the visual vocabulary just by
dividing each dimension into areas of the same probabilities two three four five and then
the number of for each word is just corresponding to the cartesian product of such
a decomposition so which means that we have just to the power of seven three
to the power of seven for power of seven and so all and so on
works for T three description of term three features or two to the power of
three four and so all the four C two description so basically that is a
that mathematical background and now the algorithms and the evaluation we have used in a
relatively small database which is available over there but the beauty of the database is
that it contains newly duplicate images and all the images are manually outlined and which
means that we can uh automatically determine whether two features or to keypoints are properly
matched of course there are some mistakes because it might be that they are matching
not the same fragment of all of the same object or there should match similarly
looking fragment of different objects and the way this is a pretty to be a
reliable verification reliable grounds tools and then
we have been trying we will be trying to a three
to measure the performances using both traditional keypoints and goals you term two and three
there are three features in the foreseeable to which was used a as a representation
for individual is used as a small vocabulary of five hundred words may be too
small but it and it turns out that it's not really that bad and
and these are the results it's a first we just matched individual keypoint using one-to-one
no term to know term three then a again individual keypoints using just a many-to-many
the same word then methods were just individual term two and term three features are
matched using a than using either a one-to-one or many-to-many
and it turns out that all these results are using this
the number of correct matches when we use individual keypoints are we use individual term
for term three features are basically at the level of statistical error there's no there's
no use of such a big how weather easily combine term two or three we
keypoints it keypoint description which we is that we matched features geometrically term parental three
only if they contribute the keypoints have the same words the results are much better
and
these are exemplar is not results for this database is used that are very good
the problem is that the number of the matches is very small compared to the
number of uh compared to the to the number of matching images because they're over
five hundred images which contain similar fragments but altogether we have around one thousand five
hundred two thousand keypoint found in there we use are very features found in the
whole database and it turns out that all of them are concentrated in just fifteen
twenty images so which means that the method is quite good but at its purest
it's all that is that of all we used to just words for both terrible
features there are three in particular and then works for the contributing sets then the
results are better in terms of the number of sound correspondences about the percentage of
correct matches is not so good it's too much better than previously it's about twenty
percent but it's still
not enough and eventuality the ultimate method combines
both were four and three which me is sort of forty three term three which
means we take triplets with five we can we match works for T three then
we match were for their tool for each pair of keypoints within the triplet and
finally we must we match also a individual sift works for the comparability a report
the contributing keypoint and then especially if we use harris having a the results are
quite good uniform so are we still have not too many of them but for
harris affine they are quite many of them and the current level of phone correctness
is very high ninety three percent of the reconstructed of the retrieved features are matched
properly which means they actually belong to the same patch similarly looking past of the
same object it is also and then recall the results are also not that bad
the effective recall is about eighty percent a which is also not that bad especially
that some of these objects are really learn so there are not many keypoint there
in
that's it these are just exemplary results returned by a that it by this method
of course to the ellipses that show the outline because the individual keypoints are not
so well as seen that there is you see the results are really quite board
and all of them are achieved by a simple match of individual features no verification
of configuration constraints not be like that just matching individual features
yeah that some of these are from the database it somehow from this and is
a data base um are just from i don't databases maybe a available databases have
been used that also some semantically incorrect results but if you look at about eight
dollars way that that's right
yeah or in there
but this example so that basically eight and so summary these advantages
and disadvantages we still
have to reduce the number of keypoints are not perfectly yeah having the battery on
that was one of the conclusions and eventually we have to as an ultimate solution
you want to incorporate very similar data into descriptions of individual keypoint so which means
that we can identify similarly looking fragments of images but don't by matching descriptions of
three calculated descriptions of individual keypoints rather than features comparing so we keep that said
thank you very much sorta for being played for one minute
okay
well as i would have to be fast
yes
no not that you of the two older
yes oh yeah
no but it because basically they're all the three of them so we cannot incorporate
more and the identity is that you each of them contain is represented by individual
words so which is that we can just matt sentences of a few words so
that's why this is the number of data we combine three of them
yeah
yes
yes
i
yeah well basically i'm not saying that i'm identifying the outline of the patch i
and identify and detecting the presence of the patch so even if it's a single
keypoint or a single feature it shows that there is a patch there and it
generally is the outline of the patch it's a different story but
now we can we can at least know that there is a patch there with
a high level of confidence
uh_huh
that
yeah
i fully agree with you the problem is that if you don't if you're just
looking in that unknown unpredictable database for some preliminary similar is you have to start
from the method which doesn't assume and think about the content of the
and this is basically the most perfect
sorry if the images containing
yeah it's not a problem then we have a lot of the that was one
example of two bottles of beer over here and two bits are two kinds of
your over europe to cancel it be over there so basically we found that for
pairs of patterns there's just this and that so which means that in that case
we might have a lot of pairs of patches but eventually it's almost all that
is it a very good part of their to either two men the repetitions of
the pattern of course it doesn't make too much sense about you don't of two
or three or four of them works there was no problem
okay well i didn't have that i didn't have time to mention that it one
basically depends on the application so it is very naive matlab bodies a implementation we
are able to match hundreds of images within individual second
yeah of course the random images so we have not completed that in a starving
on and then C plus implementation and or more sophisticated techniques of a native a
organising the database a but i did you can be really is
i would be and the meeting
this