Přepis řeči - REAL-TIME CONJUGATE GRADIENTS FOR ONLINE FMRI CLASSIFICATION

0:00:20	so that had a lot of our paper is the real-time conjugate gradient for online fmri classification
0:00:27	um so first i be with on a schematic
0:00:31	oh the real-time fmri system and then i will move on to show you some
0:00:36	um previous work on online learning algorithms and our proposed
0:00:40	real-time conjugate gradient
0:00:42	and in the last part i show you some test results
0:00:46	um so come initial model if the are experiments are done in a batch processing
0:00:52	so the experiment will give a certain kind of task to the subject in this brings again or
0:00:57	a for example with um brain state classification
0:01:01	and um
0:01:02	by the end of the experiment
0:01:03	we gather at time series of brings again image
0:01:07	and to we apply certain kind of um offline learning algorithms to have some inference results
0:01:14	um
0:01:15	in contrast
0:01:16	a real-time fmri system
0:01:18	you know a time if from my see some we don't need to wait at the end of the experiment
0:01:23	at each time point
0:01:24	we have uh three D um brings again image
0:01:28	and uh by using and online learning algorithm we can have the inference results of the current bring state before
0:01:36	we have the next
0:01:38	um rings scanned image
0:01:39	and um
0:01:41	the benefit of doing so is that by using the real-time feedback
0:01:46	um the experiment or can use
0:01:48	um the real time feedback to monitor the data colour T
0:01:52	um or to do the real mine reading
0:01:55	or to modify the task while it is still going on
0:01:59	in the
0:02:00	if we give the real time feedback back to the subject in the scan are then we can be you
0:02:05	to bring computer interface
0:02:07	um
0:02:09	however the um
0:02:10	um benefits comes with the channel
0:02:13	so the main
0:02:13	a for the real-time fmri system um is still computational complexity
0:02:19	because um we want to process uh
0:02:22	if data my data which is usually of dimension tend to the power five with the one T are which
0:02:28	is usually to choose three again
0:02:30	and we also want to choose and accurate and also adaptive algorithm
0:02:35	so that we can allow the experiment or to modified the task um on the fly
0:02:42	um
0:02:43	so this is our proposed mass or which is called the real-time conjugate gradient
0:02:48	it is motivated by a wide used um
0:02:51	um algorithm in neuroimaging community which is card the partially squares
0:02:56	and a what L was miss line
0:02:58	so we um we do
0:03:00	but training me and the class flying in real
0:03:04	and uh in a real if test
0:03:06	um it shows that our algorithm is
0:03:09	fast
0:03:10	and it can rich and accuracy about ninety percent with being zero point five seconds
0:03:15	um
0:03:16	using uh
0:03:17	and all dinner a personal computer
0:03:19	and i also show you test results
0:03:22	um which shows that the algorithm is adaptive
0:03:26	so there are many online learning algorithms out there
0:03:30	and uh some of them have been applied to fmri application
0:03:34	um but not uh but not all of them are real
0:03:38	um uh a real online algorithms some of them are are
0:03:42	a trained offline and
0:03:43	to and do the prediction on
0:03:46	but in our and definition of online learning algorithm we meaning that we need
0:03:51	both training and the classification a um in real
0:03:55	so here are some examples of the two online learning algorithm including a
0:04:00	um generally in your model we independent component analysis and support vector machine
0:04:06	um so as i i've mentioned our algorithm is based on the partially squares soul let let me first give
0:04:13	you a brief review of what is a partially square algorithm
0:04:16	so here are were input
0:04:19	data is uh matrix acts which is of dimension and i K where and is done
0:04:24	number of um um
0:04:26	uh the number of examples which are to bring can image and K is the dimension of the image which
0:04:32	is usually a um
0:04:34	the power and on the on the order of
0:04:36	ten to the power of five
0:04:38	and the output
0:04:40	is the corresponding bring state um to the
0:04:43	um brings can image
0:04:45	so
0:04:46	um partially least square assumes that
0:04:48	both the input and output are generated by a
0:04:53	i is same set of
0:04:54	latent factors have so we can express acts as F P transpose and Y as F Q
0:05:00	where
0:05:01	he and Q are loading factors for X and Y respectively
0:05:06	um and uh partially square is and iterative master
0:05:10	in each iteration it finds the new latent factors and then it does that you knew very and a regression
0:05:16	to find the loading factors P in Q
0:05:19	and in the last to it does all ran one deflation to use abstract the
0:05:24	current um to subtract the contribution of the current to latent factors
0:05:29	and then it moves on to the next iteration
0:05:32	um because it is and iterative mess so it is not
0:05:36	um
0:05:37	so efficient in a real-time context
0:05:40	so
0:05:41	in two thousand nine
0:05:43	um and
0:05:44	uh an improvement of the traditional partially surely is proposed it is called the rich partially square
0:05:51	so the mean ideal of the rich partially square is
0:05:54	uh that they add a new are the ad or rich parameter to the covariance matrix
0:06:00	so that we can extract all the latent factors in only one step instead of doing uh multiple iteration
0:06:07	um however this algorithm is do not efficient enough for
0:06:12	um
0:06:12	our desired online um
0:06:15	uh a what desired real-time system
0:06:17	so what we want is
0:06:19	and i wear them
0:06:20	which has a arable performance as the partial least squares
0:06:24	but it is more efficient
0:06:26	so when we look into the partially squares we found that
0:06:29	um these two papers so that's partially square is a
0:06:33	actually a conjugate gradient out them
0:06:36	um still
0:06:37	based on that we proposed a new um real-time conjugate gradient algorithm to fit in our war
0:06:44	um system
0:06:46	so um that's formalised the problem here so our um for the real-time system
0:06:52	at each time would receive a new example
0:06:56	S which is um the brings scanned image
0:06:58	and and our classifier trained at T minus one makes a prediction based on the new example
0:07:05	and after the
0:07:07	um i was the makes the prediction we receive the true label from the subject
0:07:11	and do we update our work classifier with the new uh with this information
0:07:17	um so the problems become support quadratic minimization problem
0:07:22	um
0:07:23	because a um
0:07:24	to make the algorithms
0:07:27	more efficient
0:07:28	instead of using all the past exam
0:07:31	we
0:07:32	take a sliding window of the examples
0:07:35	um so at each time
0:07:37	um we are on we only use the pass H examples for the training
0:07:42	um
0:07:44	so there two benefits of doing this the first one is like a mention it
0:07:48	the efficiency and what's more important it makes the
0:07:51	um algorithm to be adaptive
0:07:54	so now the problems becomes
0:07:56	this
0:07:56	and which is a quadratic minimization problem
0:07:59	and the conjugate gradients can stop it
0:08:02	so
0:08:03	what is the conjugate gradient
0:08:05	um it is an algorithm to solve the quadratic minimization problem
0:08:09	and uh it shares of a similar structure as the gradient descent
0:08:13	it is and iterative master
0:08:15	um with the their roles initialization it's search the directions
0:08:20	which are conjugate which is conjugate date to all the previous directions
0:08:24	and it does a line search on that on each direction and terminate in H
0:08:30	um so to further speed up the whole um algorithm we make two major modifications
0:08:37	the first is a good starting point
0:08:40	um so conventional like on conjugate gradient starts at the zero initialization
0:08:45	but because we using a sliding window of the data
0:08:48	um each time we are only adding a new data point and remove one oh and and all data point
0:08:55	so it is reasonable to assume that
0:08:57	um the
0:08:58	classifier at time T is very close to the class of far
0:09:02	to the previous classifier
0:09:04	so we are using the previous training without as the starting point for the search for all occurrence classifier
0:09:12	and uh that is
0:09:13	makes the algorithm faster which i will show you later in the experiment
0:09:17	a out and it also encodes the past memory
0:09:21	um which um makes them
0:09:24	um which makes the algorithm
0:09:26	um
0:09:27	have to
0:09:28	oh information of the past
0:09:30	and uh
0:09:31	another modification is
0:09:33	instead of letting the algorithm terminates in each that
0:09:36	um we make uh we lead it terminate in i'm max
0:09:40	steps so i max mediates the past to memory
0:09:44	um was done current data
0:09:46	so if i max E cost
0:09:48	you to H then um no matter where you your start point as
0:09:52	um we don't have the a we don't have any memories of the past well only in training on the
0:09:57	current data we have
0:09:59	if i max i it's less then H then
0:10:02	it has
0:10:02	the partial past memories
0:10:04	um of the previous training
0:10:09	um so we
0:10:10	in a paper we also show that compared with stuff partially squares
0:10:14	if we have the pose
0:10:16	a we have the same um zero initialization then i'll what was them
0:10:20	um
0:10:21	yeah know i in schools taps
0:10:23	with partial partially squares algorithm
0:10:26	um
0:10:26	which means
0:10:28	and the same initialization our algorithm can have a comparable performance as the partial square
0:10:36	um so now i we show you some test results we have done
0:10:40	um we tested um
0:10:42	these three algorithms the first is our proposed real-time conjugate gradient L with them
0:10:47	and the second one is to a partially squares applied to the window size data and the third one is
0:10:52	to traditional bridge partially square apply to the we know
0:10:56	site data
0:10:57	in we test it on three synthetic data and three real fmri data sect
0:11:02	for the for synthetic dataset we generate two hundred examples each of dimension two sows and
0:11:08	and which choose two sets of features each of dimension a hundred
0:11:13	so when the label is one we
0:11:15	said um one
0:11:16	of the features that too
0:11:18	have value one and when the label with minus one which choose another the other features that
0:11:23	and the label is a repeating pattern of one and minus one
0:11:27	and we as some Z noise
0:11:30	and the second synthetic data we um it's
0:11:33	it
0:11:34	we generate the
0:11:35	and a similar way as the first one but just we randomise the labels
0:11:40	and for the certain one um we designed this to test them
0:11:44	adaptive nets of our
0:11:45	um L words them so
0:11:47	um in for each um a hundred and fifty example where using a new model to generate the data
0:11:55	and for the fmri data test
0:11:58	the first
0:11:59	task is of visual perception task
0:12:02	uh so we show the subject in this skin they're of actually uh a
0:12:06	a in about
0:12:08	which is either on the left side or on the right side
0:12:11	so when to chuck about it on the left
0:12:13	then the right part of the visual cortex of the subject will be activated
0:12:17	and vice versa
0:12:19	um and uh the label of which is the position of the child about
0:12:24	um is there are repeating pattern of
0:12:27	um left and the right and do we have
0:12:30	um
0:12:31	uh and we have a day a point which is
0:12:33	um
0:12:34	of dimension mentioned about a hundred and twenty two cells and every three seconds
0:12:39	and the second data test is similar to the first one
0:12:42	i except that we minimize the label
0:12:46	and the certain one we use a we used a public available dataset
0:12:50	which is published in a two thousand signs paper by
0:12:54	S
0:12:54	and it is that can't worry related object vision task
0:12:58	so we take ten rounds of one subject
0:13:00	from the data
0:13:02	so basically they're show being either a face image to or a house image to the subject in the scanner
0:13:08	and uh each data point is of dimension about a hundred and sixty three cells and
0:13:15	so this is the test results of the three algorithms
0:13:19	um so here we showed the um prediction accuracy and also the average training time for each algorithm
0:13:25	um
0:13:26	and as you can see
0:13:28	um our our um a among these three algorithms oh algorithm them is
0:13:32	um
0:13:33	always the fastest just one
0:13:35	and uh in most cases it um the
0:13:38	uh our with them have a higher accuracy than the other two
0:13:42	and the only case it doesn't do as good as the other two is the think that a data three
0:13:47	which is
0:13:48	um the data when we change the model which generates the yeah and later i will show you how we
0:13:55	can improve this out
0:13:56	so that our algorithm can have a comparable
0:13:59	um performance as the other two
0:14:02	um
0:14:04	so this is a
0:14:06	um results um this is the prediction of out of the synthetic dataset
0:14:11	the black line here is
0:14:13	um
0:14:13	is still label of its the true label of the example with either one or minus one
0:14:19	and the blue line is the prediction out
0:14:22	as you can see it fits nicely with the true label
0:14:25	and uh there is a global learning curve you can see on the uh on the plots
0:14:30	because that is because we choose the previous
0:14:34	train you results as the starting point which encodes the memory
0:14:37	so that the algorithm gets more and more confident
0:14:40	when it sees more and more examples
0:14:45	and um
0:14:47	uh
0:14:48	and on the right side uh on the right is the prediction um plot for the synthetic data
0:14:54	sets three
0:14:55	um
0:14:56	so the reason why i why with them doesn't do ask good as the other two in uh
0:15:01	model changing on context it's because we have uh our them has the past memory
0:15:06	um so when we think the same model the memory can help you to learn faster but if you change
0:15:12	modelled
0:15:13	the memory of the past actually hurts you
0:15:16	um
0:15:17	so it's the tradeoff between the memory and adaptive this
0:15:21	um
0:15:23	in this is the um
0:15:25	prediction with out on the
0:15:27	real fmri to
0:15:30	and uh
0:15:32	um
0:15:32	as i said
0:15:33	um i will show you are and that
0:15:36	a a good starting point for the them really matters
0:15:39	because if we look at the rate um training uh the residual in the training phase
0:15:43	we can see that
0:15:45	um at first we don't have any memory of the past so the residual is very high
0:15:49	but with being ten to twenty eight pose
0:15:52	using um
0:15:53	and the memory helps so as to reduce the residual um
0:15:58	i from like five thousand two um all
0:16:02	um zero point one percent of the initial ish but of the
0:16:06	um residue at the very for speaking
0:16:08	so
0:16:09	um
0:16:10	and also we can use this information to improve our performance when the model changed
0:16:16	because
0:16:17	as you know is every
0:16:18	time the model changed
0:16:20	the residual um becomes high again so by detecting the said and change of the residual we can uh make
0:16:27	the model to of get all the path
0:16:30	um
0:16:30	or the past memories and the start over again
0:16:33	so we can
0:16:34	um improve our were um
0:16:37	but improve the performance of our with them so that it can has them
0:16:41	um can
0:16:41	it can have a comp arable i'm performance as the other two
0:16:47	so for some future work
0:16:49	um so right now we are are only um generate the prediction was that's but we and we have use
0:16:55	it to um as our real feedback to the experiment or the sub
0:16:59	oh the subject
0:17:00	so uh in in next step where
0:17:03	considering using that information
0:17:05	um to be would like a brain computer interface
0:17:08	and the we also want to try more complicated except experiments and we also want to compare with other
0:17:15	um real time algorithms out there
0:17:18	so um i think you
0:17:19	and uh i like to take some questions if there
0:17:30	so
0:17:31	question
0:17:34	on something
0:17:35	work
0:17:38	no
0:17:40	so uh
0:17:42	uh uh a week to P the uh right don't guy i'll go was all used to use the precondition
0:17:49	the so
0:17:50	have you think about that and it is possible to include that in you
0:17:54	i'll go with a
0:17:57	we do we use a precondition mouse since C is much to apply at the beginning and it's them crime
0:18:02	of normalisation "'em" of the matrix
0:18:05	so on
0:18:06	do any precondition
0:18:08	um but
0:18:10	you like
0:18:11	this thing to make a
0:18:13	each renting or
0:18:15	you can and do the um
0:18:16	um
0:18:17	but it cost
0:18:19	well
0:18:19	great uh
0:18:24	okay
0:18:26	thank you
0:18:29	so
0:18:30	hmmm

REAL-TIME CONJUGATE GRADIENTS FOR ONLINE FMRI CLASSIFICATION

Medical Imaging

Přednášející: Hao Xu, Autoři: Hao Xu, Yongxin Xi, Ray Lee, Peter Ramadge, Princeton University, United States