Přepis řeči - A GENERAL FRAMEWORK FOR ROBUST HOSVD-BASED INDEXING AND RETRIEVAL WITH HIGH-ORDER TENSOR DATA

0:00:13	okay
0:00:14	this is gonna be
0:00:15	a short presentation
0:00:16	i problems
0:00:18	and the title of the talk is the general framework for a a choice D based indexing and retrieval with
0:00:24	a order data
0:00:26	it this is done with my
0:00:27	i a student should lee and formers to
0:00:30	and channel shown she who is now it to nokia research
0:00:37	and the
0:00:38	the goal is to do retrieval
0:00:41	and indexing indexing and retrieval
0:00:44	based on motion trajectory data
0:00:46	this is an an all problem people first begin to look at this
0:00:50	issue in the late nineties
0:00:53	and
0:00:54	a done a quite a bit of work
0:00:56	a most of the work centres round
0:00:58	yeah
0:01:00	doing it and using different modalities to reduce the dimensionality
0:01:03	we introduced in two thousand and three a method of doing it using pca
0:01:08	but then we wanted to extend it to working with
0:01:11	yeah have multiple trajectory simultaneously
0:01:14	and so we had to work in tensor space
0:01:17	and so we try to do something like pca intense or space
0:01:20	and we to the number of different techniques
0:01:23	a yeah
0:01:24	for
0:01:24	for tensor decomposition
0:01:26	based on
0:01:27	yeah
0:01:28	using higher order S D
0:01:31	and paris type model and another technique which we develop their cells
0:01:35	and the word various off
0:01:37	what we don't gonna focus on to they use the issue of how to deal
0:01:41	this
0:01:41	problem when you're dealing with tensor
0:01:44	but now of the query dimensionality does not mention
0:01:47	does not match the tense a dimensionality
0:01:50	meaning
0:01:51	you may have a different number of objects
0:01:53	a um
0:01:54	different like
0:01:56	or
0:01:57	in particular the case that we have here is different number of camp
0:02:01	so we actually are dealing with a different number of objects and different number of cameras
0:02:05	and the query we for example may have a single camera
0:02:08	and the database has multiple camera
0:02:11	and so the question is how can you do this the without having to we compute
0:02:15	a separate indexing for each scenario
0:02:19	and some than the talk a a a a little bit about
0:02:21	the invariance properties of the H of as U D
0:02:25	and how to apply to the indexing retrieval problem and present some experimental result
0:02:31	and so the basic scenario of using a a high order svd for indexing and retrieval
0:02:37	consist of
0:02:38	looking at multiple motion trajectory
0:02:41	and of from
0:02:42	yeah of multiple targets simultaneously
0:02:45	and then
0:02:46	i
0:02:47	having a compact representation in the form of a tensor as
0:02:51	and finally reducing the dimensionality and in this particular case but going to focus on a order is but
0:02:58	a more more properly people refer to it as tucker decomposition that would be a more accurate
0:03:04	in know processing that term
0:03:06	a choice of D of becoming brain
0:03:08	even though it's not use the terminology
0:03:11	uh they
0:03:13	origin of this is the following so if you look at a single
0:03:17	trajectory we can model it to say to L usually X and Y coordinates
0:03:22	of the trajectory over time
0:03:24	if we have two trajectories
0:03:26	we model it as a matrix
0:03:28	if
0:03:28	and we look at then i at the space of all of these
0:03:31	pair of trajectories
0:03:33	we get a tensor
0:03:34	a three dimensional array
0:03:37	and this is from what a single camera
0:03:39	if we now want to extend that for looking at multiple cameras in particular and this case two cameras
0:03:45	we have to three dimensional or rate or a four dimensional array
0:03:49	and so it forms a higher-order tensor
0:03:52	and you can continue this using multi modality you could like this
0:03:55	same trick for doing indexing and retrieval
0:03:58	yeah
0:03:58	for having different modalities
0:04:01	you can go higher dimension and higher dimension
0:04:04	no the reason we wanna work with a choice is with is because of the following theorem and what i
0:04:09	done the here is i've actually just the
0:04:11	loosely paraphrase
0:04:13	the the and words
0:04:14	that precise mathematical description of the theory
0:04:17	a paper
0:04:18	and about uh um
0:04:19	a page and a half of the paper
0:04:21	devoted to the proof of the theorem
0:04:23	but basically with the cr says is something which is quite into it
0:04:27	a we are all familiar with the for a transform
0:04:31	and if you have a multi the dimension of for a transform any now wanna take yeah the three dimensional
0:04:36	for transform that thing
0:04:37	and you now to take that two dimensional fourier transform only
0:04:40	it's sufficient to just simply look at the corresponding to the mentioned
0:04:44	they will have
0:04:45	the the you can just take the inverse with respect to the third one and will have the right
0:04:49	two dimensional fourier transform
0:04:51	and the reason for that is because of the orthogonality property
0:04:54	of the four yeah base
0:04:56	and the same thing is true a here
0:04:58	that is if i take a age of is he D and i decompose at it's decomposed into a tensor
0:05:04	and
0:05:05	in
0:05:06	unitary matrices
0:05:07	and so because of the a or orthogonality with the unitary property of those matrix
0:05:12	if find out think the scene it's sub tensor
0:05:16	so to get portion of the original tensor
0:05:18	and
0:05:20	i you can apply to a H of P D
0:05:22	i will get the same corresponding unitary matrices
0:05:26	for the dimensions of a in which i chosen for the subtensor tensor
0:05:29	and i do not need to calculate them again from scratch
0:05:32	which means of the corresponding indexing of the sub tensor
0:05:35	would be identical
0:05:36	a a of the same mold
0:05:39	or the same unit are a major
0:05:41	so if you want to precise mathematical description of what i just said and what's written here
0:05:46	it's in the paper and a proof of it is in the paper
0:05:48	and i should say one more thing this is uh a a result that was first
0:05:52	oh for three the mention tensor
0:05:55	yeah three order tensor
0:05:57	by that change how how as part of is a P D as at university of london
0:06:02	and what have done in this paper is
0:06:03	extended to our bit-rate dimension
0:06:06	the result
0:06:06	it
0:06:07	always true no matter what dimension
0:06:10	but it is a critically important thing for us because if we were to work with a different type of
0:06:14	decomposition
0:06:15	like paris
0:06:16	or parallel factor analysis
0:06:18	or can a call or any of the other one
0:06:21	a property fail
0:06:23	and we would be unable to do anything that we're doing in this paper
0:06:26	because you would have
0:06:27	to we compute everything from scratch for each such that
0:06:32	and so that that we have this property we can proceed along the lines of the original work that we
0:06:38	did for tensor decomposition
0:06:39	X this time we do it a lot a sub tensor is only
0:06:43	so the indexing part
0:06:45	and proceeds along the very same lines we have a H of ways P the we compute for the tensor
0:06:50	in this case the four dimensional tensor
0:06:53	and
0:06:54	a with take the mode
0:06:56	of the query
0:06:58	and do it
0:06:59	similar decomposition but this time we do it only along
0:07:03	the M
0:07:04	a modes if we choose
0:07:06	and then yeah are we
0:07:09	slice
0:07:10	and a T have
0:07:11	in index set tensor as
0:07:14	and with the number of index tensor is is computed
0:07:16	a the following
0:07:18	for
0:07:21	and for the retrieval procedure we simply
0:07:24	and a compare the query index
0:07:27	and yeah
0:07:28	to the to the query tensor that we that we have obtained before
0:07:32	and then a just simply do a frobenius norm between the two
0:07:36	so the algorithm to be compute
0:07:38	is essentially the same
0:07:40	as we presented a uh several years back
0:07:43	on
0:07:43	tensor
0:07:44	base
0:07:45	a a comparison for indexing and retrieval of motion trajectory
0:07:49	the main difference between this work and uh uh and our previous
0:07:53	is in our previous work it was generic didn't care what
0:07:56	tensor decomposition channel
0:07:59	and it applied it on the same
0:08:01	a dimensionality of then sir for the query
0:08:03	and for the data
0:08:04	and that's a
0:08:05	a strong assumption
0:08:07	yeah because we we have no control over the query size
0:08:10	and this is especially true when you're dealing with multiple cameras
0:08:13	and multiple camera tensor
0:08:15	a queries
0:08:16	because
0:08:17	not all cameras have access to the same trajectory simultaneously
0:08:21	and so of the main difference here is that we are only looking at the substance or
0:08:25	for which they gave available
0:08:27	and then
0:08:28	comparing compare and then
0:08:29	obtaining the corresponding a uh
0:08:32	query representation from our original in
0:08:36	which is index over all possible modality
0:08:40	and so here the uh experimental results for work
0:08:44	and uh these are collection of
0:08:47	tensor is of a from the caviar datasets from in
0:08:52	and
0:08:53	these are from two cameras sets
0:08:56	and this is the uh precision-recall recall curve
0:08:58	corresponding and this is for complete queries
0:09:02	and
0:09:03	the
0:09:06	resulting yeah uh these that the in matrix sizes
0:09:10	and here are are are the indexing time and retrieval time
0:09:13	and i should say that the uh
0:09:16	indexing time is
0:09:17	for a choice of be D are traditionally very good
0:09:20	and where they suffer is a which remote time
0:09:22	we do not
0:09:23	remedy this
0:09:24	and yeah
0:09:25	we
0:09:27	the of five well perform the retrieval times here
0:09:30	and what we have to say is that we have to pay this price
0:09:33	if we want to have the flexibility
0:09:35	of dealing with different yeah size subtensor as
0:09:38	in the query and a database
0:09:43	and
0:09:43	here we do the same thing but for partial queries
0:09:48	so they query and the data size are not same size
0:09:52	and these are the corresponding precision recall curve
0:10:03	so
0:10:04	short
0:10:05	our our am am main
0:10:07	messages
0:10:08	a shows with D or type decomposition
0:10:11	because of its sort the orthogonality
0:10:13	is particularly useful in applications where
0:10:16	you
0:10:17	do not know in advance
0:10:19	what are the dimensionality is and you need to make
0:10:21	a mix and match it during query time
0:10:24	and so we have applied this general principle in our case to motion trajectories
0:10:28	but it can be applied to any higher order data
0:10:31	an analysis with the retrieval or not
0:10:34	and show that it actually yeah
0:10:36	the
0:10:37	very well
0:10:40	thank you very much

A GENERAL FRAMEWORK FOR ROBUST HOSVD-BASED INDEXING AND RETRIEVAL WITH HIGH-ORDER TENSOR DATA

Image and Video Indexing and Retrieval

Přednášející: Dan Schonfeld, Autoři: Qun Li, Xiangqiong Shi, Dan Schonfeld, University of Illinois Chicago, United States