Speech Transcript - INTRA-FRAME PREDICTION WITH LAPPED TRANSFORMS FOR IMAGE CODING

0:00:13	meaning
0:00:14	yeah to go and
0:00:17	the paper in the thing pretty
0:00:19	with flat
0:00:20	so
0:00:20	i image
0:00:21	the presentation divide it will be defined
0:00:24	fall
0:00:25	for as we we will give motivation then uh a very brief
0:00:29	that
0:00:30	but know
0:00:32	then we you are going to review the the frame pretty each that's proposed for each the
0:00:37	for
0:00:38	the difficult
0:00:39	to to for using this flat transform
0:00:41	the proposed solution and modification the
0:00:45	model selection
0:00:46	after a going
0:00:48	so
0:00:49	it's some experimental results and then conclusion
0:00:52	fusion
0:00:54	the first question must be white to use that the transform
0:00:58	when most of the
0:01:01	in like to a specially if you do all standard
0:01:04	five
0:01:05	that's a one
0:01:06	use plot and so
0:01:09	uh
0:01:10	the the fact that of using that and some since this a part of this used just a you in
0:01:15	the block itself
0:01:17	maybe to a
0:01:18	that would be good read the shown
0:01:20	the block yeah so
0:01:22	besides that
0:01:24	for the same reason we can exploit
0:01:26	had to the sum of neighbour block
0:01:29	that was before you know
0:01:31	both of those
0:01:32	the leads to a superior performance fine coding
0:01:36	in objectivity in subject to
0:01:39	and measure
0:01:41	but that comes list the expense of for higher combat
0:01:45	and this point
0:01:46	the only as
0:01:48	uh you might to coding standard that uses
0:01:50	let but are song it i is J back so
0:01:53	that was proposed but microsoft corporation it was
0:01:57	formally known
0:01:58	i those for
0:02:00	and
0:02:01	you from go it
0:02:02	uh have proposed a a new interest frame prediction it not then
0:02:06	and we propose here we propose a inter-frame prediction in the pixel domain
0:02:11	much seem not what's then each thus
0:02:16	no we'll go to a very brief
0:02:19	okay
0:02:21	uh you know that transforms at in block transform this you know is divided in block
0:02:27	and that will be processed the in
0:02:30	but different from
0:02:31	block transform
0:02:33	the
0:02:34	uh this part will be bigger
0:02:37	we have a a a a and this new row that will have in the time more samples then
0:02:42	the original one that would call
0:02:45	Z
0:02:45	to a different from the traditional X
0:02:49	log
0:02:50	and is the but as the this
0:02:53	uh
0:02:56	as a that it will have been times more
0:02:58	uh
0:03:00	samples was
0:03:00	this and will be called overlap set
0:03:03	and we have a here for that the example for the overlapping factor of two
0:03:08	we can see that half of the sample
0:03:11	of
0:03:11	but an important blocks
0:03:13	in the last and the
0:03:14	i
0:03:15	will be used
0:03:16	the of the traditional samples that were or a red to use it in the back
0:03:23	B B and Q
0:03:24	the matches is contained in the direct and inverse transform the sound will be held
0:03:31	in the same way as in black dots on
0:03:33	with the difference and uh
0:03:35	and that
0:03:36	the thing yeah
0:03:39	will be different from the original one
0:03:43	where that because the traditional criteria for perfect reconstruction is not no longer
0:03:50	value
0:03:50	if we want to have a perfect reconstruction would have to follow the this new criterion that is
0:03:57	showed
0:03:58	uh
0:03:59	below little here
0:04:03	okay that
0:04:05	the point is that the blocks can not be a concern
0:04:08	in that in
0:04:09	for a construct this C no or
0:04:12	a a does this know we are going to have to like late
0:04:16	the contribution of several neighbour block
0:04:20	and a that that i and not the difference from block to is that in the or the since we
0:04:25	don't have all the name but
0:04:27	neighbor block
0:04:28	need that
0:04:29	we have to care
0:04:31	um a the tree
0:04:35	no
0:04:37	this
0:04:37	see again uh
0:04:39	the frame rate
0:04:41	most so i i believe most of you is already
0:04:45	familiar with this speech
0:04:47	we well i wanted to remember that
0:04:50	even though the it's not it's it's for what was proposed to be a video coding standard
0:04:56	when used to encode
0:04:58	in my
0:04:59	you in my just
0:05:00	it is a very efficient a image coding
0:05:04	in one of the key features
0:05:06	why i
0:05:07	happens used the
0:05:09	implementation of
0:05:10	inter but each
0:05:12	that consists on using the immediate neighbours
0:05:16	but some neighbours of
0:05:17	because of the encoder there close to that
0:05:19	well
0:05:20	to predict the a lot were and coding at this moment
0:05:24	and in this way you just have to send the with
0:05:29	here you can see that nine and prediction modes
0:05:31	for a four by four in a by lot
0:05:35	and we have additional
0:05:37	for a prediction modes for a lots of sixteen by sixteen B
0:05:43	oh Y
0:05:44	what's the dish you could of using this is scheme in a lap but transforms
0:05:50	and uh the for the of
0:05:52	because to a good is
0:05:54	C we it's than in it was at local
0:05:57	not the words would have to have the a construction of the peaks as
0:06:00	for use
0:06:01	it in the prediction
0:06:04	and we as we so
0:06:06	we don't have the have construction before having all the neighbours
0:06:10	we can not perform the
0:06:13	the prediction of
0:06:14	a a a a a a a a a broad we are we want to go be
0:06:18	you
0:06:19	the colours
0:06:21	but uh uh the central out here the
0:06:23	being the block we are
0:06:25	going to in
0:06:27	using a that are solved we would need the
0:06:30	bigger
0:06:31	and square
0:06:32	but we have only this region here
0:06:35	that a use is available
0:06:37	for a constant
0:06:40	we propose that instead of predicting the block itself that like to pretty the this is standard block
0:06:46	in this way
0:06:48	every single pixel so outside the bar
0:06:50	far away from the as will be predicted for time
0:06:54	for our overlapping factor of two
0:06:57	yeah yeah the if we want to do this
0:07:00	we would do is to be me sing
0:07:03	half of the peaks as in the last one
0:07:06	because
0:07:07	and the black you hear have have not not a been
0:07:11	uh
0:07:13	but
0:07:16	if should but we go to the way that
0:07:18	jpeg that this
0:07:19	it's not a six four in the block in which
0:07:23	a about is
0:07:25	a true putting a a a larger block
0:07:27	we have a block of it by it is that the but in a
0:07:30	model a block of sixty but sixteen
0:07:32	and the blocks so oh oh O do in this way
0:07:36	we can see that for the first block
0:07:39	since the for a lot of the proof isn't mark or bob had already been called
0:07:44	we need this case would have all the peaks is of variable for the prediction
0:07:49	however when we go to the set and more
0:07:52	we would go back to the the case we had before where
0:07:57	the last
0:07:58	i
0:07:58	neighbours are not of a
0:08:01	in the
0:08:03	becomes worse because we have all rows
0:08:07	half of the pixels in arts and uh
0:08:10	and in the four we would only have
0:08:14	i was more or not
0:08:16	i don't know
0:08:18	and
0:08:18	see i D is is more corner to predict
0:08:21	the
0:08:23	the block
0:08:25	to prove a little what we probe one of the things we propose is changing the order of encoding that
0:08:31	this plots
0:08:32	changing the had of those set and then the block
0:08:36	in this way in the first we have the same conditions we had before
0:08:42	so
0:08:45	in the set and we go to the same question all
0:08:48	the same case we had in the fourth level
0:08:51	which means that uh
0:08:54	we have just one corner in this case it got worse
0:08:58	however i that are that we do didn't have the first the the left corner we again have so
0:09:05	in this case we are going to ensure sure that half of the box will have all the pixels are
0:09:09	available
0:09:10	for the prediction
0:09:12	and and the that have
0:09:13	we have just this corner
0:09:15	in this case we are going to propose to use just a D C
0:09:19	a prediction that will consist of a pretty can all of the pieces
0:09:24	as the average of the available peaks
0:09:27	pre
0:09:29	the process will want to you know
0:09:30	a a much seem has done in in a you know the
0:09:34	coder
0:09:35	the the residual be couplet is difference
0:09:39	i
0:09:39	that that he's a big so and the prediction
0:09:43	the
0:09:44	after the
0:09:45	that
0:09:46	quantisation is a and inverse transform
0:09:48	i can we have to remember that the it is you do you only if we did not have the
0:09:53	quantisation
0:09:54	is different from
0:09:56	the original one
0:09:58	and scenes
0:09:59	we we have to mimic a perform adding the prediction what have to mimic eight
0:10:03	this process
0:10:05	in the prediction
0:10:06	we simply we simply process it
0:10:09	the prediction of before we need to the was you to to obtain that you have a we with the
0:10:14	thing
0:10:15	if we didn't have the prediction
0:10:19	also also have tool
0:10:21	change though the way that
0:10:23	the mode so it's selected
0:10:25	in we have maybe two
0:10:28	mm make two wave of
0:10:30	so like like no one is the reduction of
0:10:33	the minimization of the prediction
0:10:36	that's normally in measured by a sum of of bits a the difference
0:10:40	which doesn't seem to be up but in this case because
0:10:45	a is a big part as and the have construction of every C
0:10:49	pixel so will be different depending on the position
0:10:53	so we propose to weight the i and that this difference
0:10:58	a going to need to the importance of the
0:11:01	of of you of its simple
0:11:04	uh and the proposed a weighting in is that we use in this work is
0:11:09	given by these that would be the have constructed
0:11:14	uh
0:11:15	have that we would have if we had that much it just with one
0:11:21	the rate distortion optimization can also be performed a but we have to remember that since we cannot not construct
0:11:27	the plot
0:11:29	the distortion has to be a and and measure in the transform domain and the a for the norm
0:11:35	orthogonal transform
0:11:37	we have to take a called the energy of a
0:11:43	this is the uh one example of weighting matrix
0:11:46	or the you generalized lab
0:11:48	of the hot transform that at but
0:11:50	scene
0:11:51	well the few as well or thing by maximizing the coding gain for uh out of a grass is model
0:11:58	where
0:11:59	uh
0:12:00	correlation factor of zero point nine five which is
0:12:03	not to be a a good model for in
0:12:09	and now we go to
0:12:10	a and we will present uh
0:12:13	with
0:12:15	a so of for and the implementation that idea at at
0:12:18	this point would was just to prove the concept that that
0:12:22	the the of sounds can be used the again at
0:12:25	we
0:12:26	uh
0:12:27	that but does ones could use a with friends good frame prediction
0:12:31	so for simplicity reasons
0:12:33	the than a dish was scaled denoted by we'll
0:12:37	and because of this court the data
0:12:40	coding but that was six at two eight by
0:12:44	also we use a the general like to be a talk or not a song that would result
0:12:49	for several reasons uh
0:12:51	but the overlap you of overlapping factor of two
0:12:54	maximise the number of lots that we can use the pretty
0:12:59	the she she B T has a a good performance for this overlap factor
0:13:05	and uh uh uh just every since the overlapping factor or or was of to
0:13:10	we had a prediction block that once of sixteen by sixteen
0:13:15	and different or we can implement any number of prediction modes
0:13:19	at this more that's point only the four modes of a able in the H that six four
0:13:24	were implement implemented
0:13:26	when we are going to use
0:13:27	let and so
0:13:30	uh
0:13:31	for comparison reasons
0:13:34	also the traditional intra-frame prediction was implemented when using uh dct
0:13:41	in this case all the nine modes
0:13:43	were available
0:13:45	and
0:13:46	oh
0:13:47	including them
0:13:48	the mode
0:13:49	we use we we is that the same approach to
0:13:52	a propose for it's that the six for that
0:13:55	means that means
0:13:56	looking at the
0:13:58	that an upper neighbour
0:13:59	we but they the best uh
0:14:02	the most probable mode
0:14:04	and we can that in
0:14:07	or the proposed of this scheme since the a or a neighbour
0:14:12	didn't have that
0:14:13	several models of prediction
0:14:16	was seen play just to use to bit cold
0:14:19	to to the half of the blocks in for the order have we don't have to encode anything "'cause" there
0:14:25	is just one
0:14:26	well
0:14:28	and
0:14:29	besides that that that application
0:14:32	which means
0:14:33	no additional like
0:14:35	of flat but that's forms were also that
0:14:41	you can see for uh
0:14:44	are they
0:14:45	imagine a better
0:14:46	we can see that uh
0:14:48	the proposed
0:14:50	method that uses in blue
0:14:53	are performed
0:14:54	the transforms
0:14:56	and then you a plus dct that we plan
0:15:00	we can see also that most of taking in this case was that just by simple education of love but
0:15:07	or some but but again
0:15:08	our proposed method the
0:15:10	uh use
0:15:12	for the
0:15:12	improvement in the
0:15:14	coding
0:15:17	this same result
0:15:18	signal as a
0:15:20	present
0:15:21	by the first frame of the for which D C can
0:15:25	give a bad
0:15:26	in which again way our
0:15:29	uh propose a method the or performed both in
0:15:32	you inter a C T and
0:15:38	and then can and we have for several other image
0:15:41	in we she we can we can see you the result
0:15:44	for the you the but not much
0:15:48	or more comparing to
0:15:50	in to plus D C T and and the right to
0:15:54	we have uh lap buttons so
0:15:56	we can see if here that we have a no
0:16:01	okay
0:16:03	uh
0:16:05	now we go to the conclusions
0:16:07	the results presented here
0:16:09	sure that the entrance to addition could be adapted to be compatible with like
0:16:17	also also
0:16:18	we show that this
0:16:20	propose a scheme are performs the application of lapped transform as well as the inter prediction with this at the
0:16:28	you all tested in
0:16:31	important to note that in our case we have just half of the block
0:16:36	but the being predict and the
0:16:39	new in have we have only
0:16:41	for
0:16:42	prediction modes we so if we implement that all the i'm not a different number of the
0:16:48	results presented you
0:16:50	could be for to improve
0:16:54	we have a also it's a very preliminary
0:16:58	results
0:16:59	but
0:17:00	this is going to imitation of the
0:17:03	of the scheme presented the here in a a real which
0:17:06	that's six for older
0:17:08	we can see here that even though the gains
0:17:11	present
0:17:12	here
0:17:12	i smaller
0:17:13	we have a
0:17:15	in all tested
0:17:16	rate
0:17:18	for the image by
0:17:22	uh
0:17:24	a few
0:17:25	reasons why the gains are a smaller it it's that in this case where competing with
0:17:31	not all
0:17:32	not only the nine modes
0:17:34	of fate by eight but
0:17:35	then nine modes of
0:17:37	four by four and the for of sixteen by sixteen
0:17:40	and in our case we have only
0:17:43	them for modes of
0:17:45	uh
0:17:46	that time limit in in eight by
0:17:50	uh coding block
0:17:51	also the one
0:17:53	and a but may not be as well as a a a at that that
0:17:57	two
0:17:58	compared up to compress in
0:18:00	a a lot but transforms transform utterance from
0:18:04	coefficients
0:18:05	as is
0:18:06	in the dct
0:18:10	i four
0:18:11	sure where we are what what i've seen
0:18:14	a
0:18:15	i'll way of
0:18:16	implementing infallible
0:18:18	size prediction block
0:18:20	as well what that thing out of a lot of factors
0:18:23	to see
0:18:24	we use what we lose in the prediction we can gain in the near future
0:18:30	and and uh um we all i want to say you also that a station of this work has been
0:18:35	a set than in i C two thousand and
0:18:38	they extension to feed you
0:18:40	once
0:18:41	sept and will be presented nice
0:18:43	so
0:18:45	that can
0:18:46	my
0:18:47	presentation thank you and
0:18:50	well
0:18:51	but
0:18:57	any questions from now
0:19:04	i one quick question can you comment a computational complexity group
0:19:10	okay just
0:19:11	thing is the the suit uh
0:19:13	in this case for times B
0:19:15	the prediction will be done
0:19:17	for time you can even though i did manage that
0:19:21	that's see
0:19:22	i can say that will be at least for time
0:19:26	more
0:19:28	a the in the part of prediction hand
0:19:31	in first and
0:19:32	the direct
0:19:34	well
0:19:36	but the point point
0:19:37	the that or the power will be seen it to the
0:19:40	it's that
0:19:44	and
0:19:45	question
0:19:46	oh
0:19:49	but was of speaker

INTRA-FRAME PREDICTION WITH LAPPED TRANSFORMS FOR IMAGE CODING

Image Coding

Presented by: Rafael Galvão de Oliveira, Author(s): Rafael Galvão de Oliveira, Béatrice Pesquet-Popescu, Télécom ParisTech, France