Speech Transcript - Detection of Near-duplicate Patches in Random Images using Keypoint-based Features

a good afternoon ever about this a try to be as brief as possible so

this is the type of the presentation my name and the name of my collaborators

contributed some parts of this presentation so let's start from the outline so basically i'll

be talking about certain problems of an image retrieval using that the cup in general

the concept of keypoints and the then i highlight some disadvantages of typical keypoint based

approaches that new features based on keypoints as well introduced and then a brief overview

gradually build several algorithms combining those two those new features with traditional sift descriptors and

of course everything will be supported by some preliminary experimental verification so basically the problem

of detection of near duplicate image fragments are image patches is quite straightforward if we

have two images by not be interested in their in determining whether they are similar

or nearly identical images but we might be interested whether those images contain any unspecified

and an unspecified number similar fragments similar patches so these are examples of images which

are generally different but which contain some similarly looking fragments this these similarities are usually

represented by a transformation affine transformation uh denote a generally accepted model so another example

of completely different images which nevertheless contain some fragments those patches are also to but

not always represented just a similar or identical objects placed on different backgrounds

and in general we expect the results something like that i mean we have those

buttons all kinds of the errors and then we a we have we expect somehow

i'm showing that these are identical fragments identical need nearly identical patches even though the

images are different or another example of those two fragments are similar are those two

but are similar accidentally be represent more of the same object or another example the

same idea and one more over here a these images are different but they share

some nearly identical similar patches so of course uh the concept of may keypoint-based matching

case a very well known as very well established at the very powerful tool but

in the context also the detection of near duplicate patches near duplicate fragments it has

several basically two major disadvantages the first one is that individual keypoint matches even if

they correctly represent local similarity is a very seldom correct in the context in the

wider contacts any local or global complex and these are just a very simple example

shown in two images which are though different and it contains a small fragment and

these are a keypoint matches if we use the most restrictive technique one-to-one keypoint correspondences

based on the mutual nearest neighbor values exhibitions and harris affine keypoints so as you

see it's high it's very difficult to identify which fragment activity you're nearly similar or

nearly identical or another fragment which is even more confusing because now we are using

many-to-many correspondence is based on the same sift words and their work well the number

of correspondences if you order and some of them actually corresponds to a similar fragments

existing these two images so it is have some statistical examples of the small data

because we're using for experiments shopping of data if we have images which contain just

fragments which are similar the percentage of correct matches of all the matches between individual

keypoints are very small actually a it renders all these methods use less in terms

of bit direct to use it of keypoint correspondences as a tool for that nearly

near duplicate fragments near duplicate patches so what we generally do we is supported this

technique with some second step algorithm so which are jointly called do you the constraint

can agree configuration constraints verification we can use different techniques like to run start which

doesn't work you for their too few correct correspondences some variance of runs like prosodic

geometric hashing hough transform the large vocabulary somewhere that identical words are used as seeds

for the future matching and so on and so one but into a all these

techniques require these well i'd say post processing when we matching pair of images so

which means they are a certain limits of the size of the database to be

such techniques can be used so the

but that's why that's why we have proposed a new features which are still based

on the keypoints but using these features we can achieve much higher the uh i

mean the reliability of the techniques which are based on the straightforward feature matching no

other verification of the geometry of a pulse trains and so on and so on

so basically these features are term these features are named term features in turn comes

from the tuples of ellipses represented by moments will be later explain why we use

these names there is

some similar right yes in the past so these are the in my opinion the

best two examples of such ideas which nevertheless were not fully developed in the in

the direction but uh

yeah the features we propose are based on the

triplets or supplementarily pairs of keypoints in this and since the that we propose descriptors

for those features that you could have descriptors we can convert descriptors into the vocabularies

of the visual words so in the future we can just imagine those features like

matching the descriptor for by matching visual words and of course not because we assume

that this might be distorted by affine transformations are using a affine invariant detectors which

a return elliptical keypoints as our starting point actually we are using two categories of

it detectors we are using a harris affine and we are using and seven very

popular so starting from the term three features so if we have a and alex

which could be a key point and an external point P one we can actually

find these two points are a zero one eight one zero one B based on

the direction of the tandem flight and intersection point

is there any

i mean i would be i and then you create so we have these two

points are zero one a can be uniquely defined by the ellipse and by this

point P one if we have another point we have another pair of such points

he uniquely defined by this external point and by the al it set so that

we can actually set for such points a if we're a two other external point

are available for a given talent and then by using two of these four points

and by using the point where the corresponding line intersects the objects we can you

need to create an

trapezoid which is defined by the L it's itself and the by these two external

points in other words the geometry of this trapezoid is defined by the ellipse itself

in the context of to external points is uniquely defined so then of course fortunately

the clean out about this is quite obvious assumption so then if we have a

if we have three ellipses for each of them if we can see that the

centers of the other two and next are not points we can build such a

trapezoid and these three trapezoids are considered term three feature so this is the feature

which represents these three ellipses but in the context of the other two ellipses of

course these points are sort of these features are change and the geometry of these

features changes will corresponding the geometry of that of the additive so if we have

a transformation linear or a affine transformation of that of the area contained in these

three ellipses the shapes of these trapezoids change the change corresponding in other words we

can relatively easily identified three ellipses which are the which are transformed or you will

buy it fine madden but by an affine mapping just by using and the shape

descriptors which are you invariant under such transformations and be set you are of course

that these trapezoid it's so basically

this is that on three feature and the term two feature which is based only

on two ellipses again we use is three quadrilaterals which are uniquely defined by these

three ellipses and again they change it they change covariantly with the ellipses you the

edges are in map by from affine transformation but as i said these features have

only uh are used only in the supplemental role in the future okay then

we define the descriptors for these features and because the shapes are very simple we

are also using a very simple descriptor the descriptor to giles that these complex affine

moment invariants which is even or their over that are based on a central moments

of them of the second order it divided by the central by the moment of

the zero holding by the area and then as a descriptor also that all this

teacher we use that seven values of this invariant which are gradually calculated for the

first of these shapes for the second one for the third one for two of

them another pair of them for the third rbm and finally for all of them

is a very simple seven dimensional descriptor actually it for that three-dimensional a descriptor of

term two features of our or also using just the just this battery and calculated

for those three contribute be quite a lot less then it turns out that

these

features are obviously very numerous in a typical image if we if we just take

any triplet or any pair of keypoints effective in an image so we might get

millions of features within a single image which is of course unacceptable so we deliberately

made the number of these features by using two steps the first step is of

course to reduce the number of keypoints themselves so we somehow and we tried decided

to limit the number of keypoints that are three hundred although using somehow the best

three hundred the most prominent three hundred keypoints the and then we are building of

those features using only a certain neighborhood of those of those not more than three

hundred keypoints and we are using we are using it is the neighbourhoods we do

not just traditional geometric make the records about the neighbourhood are determined by the ellipses

themselves in other words we assume that either two ellipses which are very close and

their size is uh a very large compared to the distance most probably be represented

more or less the same visual contents so we reject something like that in the

ellipses are small compared to the distance between them most probably these visual content are

not so related or rather unrelated so we also don't really consider such keypoints in

other words we use all the ellipses or keypoints which are within a certain distance

and that this those is determined by the size of the minor and major axis

of both and it's as though it should not more than half of the longer

minor axis and the not more than two of the short verify off all the

major axis so that would be arbitrarily is defined slightly different about this is the

idea and other these the sounds and additionally we exclude triangles of keypoints which are

just too narrow if one of the angles is less than fifteen degrees so under

such establishes we get reasonable numbers of keypoint is sort of features and again these

are just a statistical i results from our database all a with the images where

the number of keypoints is limited to more or less three hundred we actually have

tolerable large numbers all goes to it term two and term three features up to

sixteen thousand which is which is not that bad

and that the important point in using these features is that is statistically independent of

the individual dimensions we use over thirty million term features from over one thousand images

and we found that the correlation between individual dimensions is very no it's practically negligible

so which means that we assume that those dimensions are just independent and then we

build the visual vocabulary just by

dividing each dimension into areas of the same probabilities two three four five and then

the number of for each word is just corresponding to the cartesian product of such

a decomposition so which means that we have just to the power of seven three

to the power of seven for power of seven and so all and so on

works for T three description of term three features or two to the power of

three four and so all the four C two description so basically that is a

that mathematical background and now the algorithms and the evaluation we have used in a

relatively small database which is available over there but the beauty of the database is

that it contains newly duplicate images and all the images are manually outlined and which

means that we can uh automatically determine whether two features or to keypoints are properly

matched of course there are some mistakes because it might be that they are matching

not the same fragment of all of the same object or there should match similarly

looking fragment of different objects and the way this is a pretty to be a

reliable verification reliable grounds tools and then

we have been trying we will be trying to a three

to measure the performances using both traditional keypoints and goals you term two and three

there are three features in the foreseeable to which was used a as a representation

for individual is used as a small vocabulary of five hundred words may be too

small but it and it turns out that it's not really that bad and

and these are the results it's a first we just matched individual keypoint using one-to-one

no term to know term three then a again individual keypoints using just a many-to-many

the same word then methods were just individual term two and term three features are

matched using a than using either a one-to-one or many-to-many

and it turns out that all these results are using this

the number of correct matches when we use individual keypoints are we use individual term

for term three features are basically at the level of statistical error there's no there's

no use of such a big how weather easily combine term two or three we

keypoints it keypoint description which we is that we matched features geometrically term parental three

only if they contribute the keypoints have the same words the results are much better

and

these are exemplar is not results for this database is used that are very good

the problem is that the number of the matches is very small compared to the

number of uh compared to the to the number of matching images because they're over

five hundred images which contain similar fragments but altogether we have around one thousand five

hundred two thousand keypoint found in there we use are very features found in the

whole database and it turns out that all of them are concentrated in just fifteen

twenty images so which means that the method is quite good but at its purest

it's all that is that of all we used to just words for both terrible

features there are three in particular and then works for the contributing sets then the

results are better in terms of the number of sound correspondences about the percentage of

correct matches is not so good it's too much better than previously it's about twenty

percent but it's still

not enough and eventuality the ultimate method combines

both were four and three which me is sort of forty three term three which

means we take triplets with five we can we match works for T three then

we match were for their tool for each pair of keypoints within the triplet and

finally we must we match also a individual sift works for the comparability a report

the contributing keypoint and then especially if we use harris having a the results are

quite good uniform so are we still have not too many of them but for

harris affine they are quite many of them and the current level of phone correctness

is very high ninety three percent of the reconstructed of the retrieved features are matched

properly which means they actually belong to the same patch similarly looking past of the

same object it is also and then recall the results are also not that bad

the effective recall is about eighty percent a which is also not that bad especially

that some of these objects are really learn so there are not many keypoint there

that's it these are just exemplary results returned by a that it by this method

of course to the ellipses that show the outline because the individual keypoints are not

so well as seen that there is you see the results are really quite board

and all of them are achieved by a simple match of individual features no verification

of configuration constraints not be like that just matching individual features

yeah that some of these are from the database it somehow from this and is

a data base um are just from i don't databases maybe a available databases have

been used that also some semantically incorrect results but if you look at about eight

dollars way that that's right

yeah or in there

but this example so that basically eight and so summary these advantages

and disadvantages we still

have to reduce the number of keypoints are not perfectly yeah having the battery on

that was one of the conclusions and eventually we have to as an ultimate solution

you want to incorporate very similar data into descriptions of individual keypoint so which means

that we can identify similarly looking fragments of images but don't by matching descriptions of

three calculated descriptions of individual keypoints rather than features comparing so we keep that said

thank you very much sorta for being played for one minute

okay

well as i would have to be fast

yes

no not that you of the two older

yes oh yeah

no but it because basically they're all the three of them so we cannot incorporate

more and the identity is that you each of them contain is represented by individual

words so which is that we can just matt sentences of a few words so

that's why this is the number of data we combine three of them

yeah

yes

yeah well basically i'm not saying that i'm identifying the outline of the patch i

and identify and detecting the presence of the patch so even if it's a single

keypoint or a single feature it shows that there is a patch there and it

generally is the outline of the patch it's a different story but

now we can we can at least know that there is a patch there with

a high level of confidence

uh_huh

that

yeah

i fully agree with you the problem is that if you don't if you're just

looking in that unknown unpredictable database for some preliminary similar is you have to start

from the method which doesn't assume and think about the content of the

and this is basically the most perfect

sorry if the images containing

yeah it's not a problem then we have a lot of the that was one

example of two bottles of beer over here and two bits are two kinds of

your over europe to cancel it be over there so basically we found that for

pairs of patterns there's just this and that so which means that in that case

we might have a lot of pairs of patches but eventually it's almost all that

is it a very good part of their to either two men the repetitions of

the pattern of course it doesn't make too much sense about you don't of two

or three or four of them works there was no problem

okay well i didn't have that i didn't have time to mention that it one

basically depends on the application so it is very naive matlab bodies a implementation we

are able to match hundreds of images within individual second

yeah of course the random images so we have not completed that in a starving

on and then C plus implementation and or more sophisticated techniques of a native a

organising the database a but i did you can be really is

i would be and the meeting

this

Detection of Near-duplicate Patches in Random Images using Keypoint-based Features

Geometry and Shape

Andrzej S Sluzek, Mariusz Paradowski