0:00:20 | so that had a lot of our paper is the real-time conjugate gradient for online fmri classification |
---|
0:00:27 | um so first i be with on a schematic |
---|
0:00:31 | oh the real-time fmri system and then i will move on to show you some |
---|
0:00:36 | um previous work on online learning algorithms and our proposed |
---|
0:00:40 | real-time conjugate gradient |
---|
0:00:42 | and in the last part i show you some test results |
---|
0:00:46 | um so come initial model if the are experiments are done in a batch processing |
---|
0:00:52 | so the experiment will give a certain kind of task to the subject in this brings again or |
---|
0:00:57 | a for example with um brain state classification |
---|
0:01:01 | and um |
---|
0:01:02 | by the end of the experiment |
---|
0:01:03 | we gather at time series of brings again image |
---|
0:01:07 | and to we apply certain kind of um offline learning algorithms to have some inference results |
---|
0:01:14 | um |
---|
0:01:15 | in contrast |
---|
0:01:16 | a real-time fmri system |
---|
0:01:18 | you know a time if from my see some we don't need to wait at the end of the experiment |
---|
0:01:23 | at each time point |
---|
0:01:24 | we have uh three D um brings again image |
---|
0:01:28 | and uh by using and online learning algorithm we can have the inference results of the current bring state before |
---|
0:01:36 | we have the next |
---|
0:01:38 | um rings scanned image |
---|
0:01:39 | and um |
---|
0:01:41 | the benefit of doing so is that by using the real-time feedback |
---|
0:01:46 | um the experiment or can use |
---|
0:01:48 | um the real time feedback to monitor the data colour T |
---|
0:01:52 | um or to do the real mine reading |
---|
0:01:55 | or to modify the task while it is still going on |
---|
0:01:59 | in the |
---|
0:02:00 | if we give the real time feedback back to the subject in the scan are then we can be you |
---|
0:02:05 | to bring computer interface |
---|
0:02:07 | um |
---|
0:02:09 | however the um |
---|
0:02:10 | um benefits comes with the channel |
---|
0:02:13 | so the main |
---|
0:02:13 | a for the real-time fmri system um is still computational complexity |
---|
0:02:19 | because um we want to process uh |
---|
0:02:22 | if data my data which is usually of dimension tend to the power five with the one T are which |
---|
0:02:28 | is usually to choose three again |
---|
0:02:30 | and we also want to choose and accurate and also adaptive algorithm |
---|
0:02:35 | so that we can allow the experiment or to modified the task um on the fly |
---|
0:02:42 | um |
---|
0:02:43 | so this is our proposed mass or which is called the real-time conjugate gradient |
---|
0:02:48 | it is motivated by a wide used um |
---|
0:02:51 | um algorithm in neuroimaging community which is card the partially squares |
---|
0:02:56 | and a what L was miss line |
---|
0:02:58 | so we um we do |
---|
0:03:00 | but training me and the class flying in real |
---|
0:03:04 | and uh in a real if test |
---|
0:03:06 | um it shows that our algorithm is |
---|
0:03:09 | fast |
---|
0:03:10 | and it can rich and accuracy about ninety percent with being zero point five seconds |
---|
0:03:15 | um |
---|
0:03:16 | using uh |
---|
0:03:17 | and all dinner a personal computer |
---|
0:03:19 | and i also show you test results |
---|
0:03:22 | um which shows that the algorithm is adaptive |
---|
0:03:26 | so there are many online learning algorithms out there |
---|
0:03:30 | and uh some of them have been applied to fmri application |
---|
0:03:34 | um but not uh but not all of them are real |
---|
0:03:38 | um uh a real online algorithms some of them are are |
---|
0:03:42 | a trained offline and |
---|
0:03:43 | to and do the prediction on |
---|
0:03:46 | but in our and definition of online learning algorithm we meaning that we need |
---|
0:03:51 | both training and the classification a um in real |
---|
0:03:55 | so here are some examples of the two online learning algorithm including a |
---|
0:04:00 | um generally in your model we independent component analysis and support vector machine |
---|
0:04:06 | um so as i i've mentioned our algorithm is based on the partially squares soul let let me first give |
---|
0:04:13 | you a brief review of what is a partially square algorithm |
---|
0:04:16 | so here are were input |
---|
0:04:19 | data is uh matrix acts which is of dimension and i K where and is done |
---|
0:04:24 | number of um um |
---|
0:04:26 | uh the number of examples which are to bring can image and K is the dimension of the image which |
---|
0:04:32 | is usually a um |
---|
0:04:34 | the power and on the on the order of |
---|
0:04:36 | ten to the power of five |
---|
0:04:38 | and the output |
---|
0:04:40 | is the corresponding bring state um to the |
---|
0:04:43 | um brings can image |
---|
0:04:45 | so |
---|
0:04:46 | um partially least square assumes that |
---|
0:04:48 | both the input and output are generated by a |
---|
0:04:53 | i is same set of |
---|
0:04:54 | latent factors have so we can express acts as F P transpose and Y as F Q |
---|
0:05:00 | where |
---|
0:05:01 | he and Q are loading factors for X and Y respectively |
---|
0:05:06 | um and uh partially square is and iterative master |
---|
0:05:10 | in each iteration it finds the new latent factors and then it does that you knew very and a regression |
---|
0:05:16 | to find the loading factors P in Q |
---|
0:05:19 | and in the last to it does all ran one deflation to use abstract the |
---|
0:05:24 | current um to subtract the contribution of the current to latent factors |
---|
0:05:29 | and then it moves on to the next iteration |
---|
0:05:32 | um because it is and iterative mess so it is not |
---|
0:05:36 | um |
---|
0:05:37 | so efficient in a real-time context |
---|
0:05:40 | so |
---|
0:05:41 | in two thousand nine |
---|
0:05:43 | um and |
---|
0:05:44 | uh an improvement of the traditional partially surely is proposed it is called the rich partially square |
---|
0:05:51 | so the mean ideal of the rich partially square is |
---|
0:05:54 | uh that they add a new are the ad or rich parameter to the covariance matrix |
---|
0:06:00 | so that we can extract all the latent factors in only one step instead of doing uh multiple iteration |
---|
0:06:07 | um however this algorithm is do not efficient enough for |
---|
0:06:12 | um |
---|
0:06:12 | our desired online um |
---|
0:06:15 | uh a what desired real-time system |
---|
0:06:17 | so what we want is |
---|
0:06:19 | and i wear them |
---|
0:06:20 | which has a arable performance as the partial least squares |
---|
0:06:24 | but it is more efficient |
---|
0:06:26 | so when we look into the partially squares we found that |
---|
0:06:29 | um these two papers so that's partially square is a |
---|
0:06:33 | actually a conjugate gradient out them |
---|
0:06:36 | um still |
---|
0:06:37 | based on that we proposed a new um real-time conjugate gradient algorithm to fit in our war |
---|
0:06:44 | um system |
---|
0:06:46 | so um that's formalised the problem here so our um for the real-time system |
---|
0:06:52 | at each time would receive a new example |
---|
0:06:56 | S which is um the brings scanned image |
---|
0:06:58 | and and our classifier trained at T minus one makes a prediction based on the new example |
---|
0:07:05 | and after the |
---|
0:07:07 | um i was the makes the prediction we receive the true label from the subject |
---|
0:07:11 | and do we update our work classifier with the new uh with this information |
---|
0:07:17 | um so the problems become support quadratic minimization problem |
---|
0:07:22 | um |
---|
0:07:23 | because a um |
---|
0:07:24 | to make the algorithms |
---|
0:07:27 | more efficient |
---|
0:07:28 | instead of using all the past exam |
---|
0:07:31 | we |
---|
0:07:32 | take a sliding window of the examples |
---|
0:07:35 | um so at each time |
---|
0:07:37 | um we are on we only use the pass H examples for the training |
---|
0:07:42 | um |
---|
0:07:44 | so there two benefits of doing this the first one is like a mention it |
---|
0:07:48 | the efficiency and what's more important it makes the |
---|
0:07:51 | um algorithm to be adaptive |
---|
0:07:54 | so now the problems becomes |
---|
0:07:56 | this |
---|
0:07:56 | and which is a quadratic minimization problem |
---|
0:07:59 | and the conjugate gradients can stop it |
---|
0:08:02 | so |
---|
0:08:03 | what is the conjugate gradient |
---|
0:08:05 | um it is an algorithm to solve the quadratic minimization problem |
---|
0:08:09 | and uh it shares of a similar structure as the gradient descent |
---|
0:08:13 | it is and iterative master |
---|
0:08:15 | um with the their roles initialization it's search the directions |
---|
0:08:20 | which are conjugate which is conjugate date to all the previous directions |
---|
0:08:24 | and it does a line search on that on each direction and terminate in H |
---|
0:08:30 | um so to further speed up the whole um algorithm we make two major modifications |
---|
0:08:37 | the first is a good starting point |
---|
0:08:40 | um so conventional like on conjugate gradient starts at the zero initialization |
---|
0:08:45 | but because we using a sliding window of the data |
---|
0:08:48 | um each time we are only adding a new data point and remove one oh and and all data point |
---|
0:08:55 | so it is reasonable to assume that |
---|
0:08:57 | um the |
---|
0:08:58 | classifier at time T is very close to the class of far |
---|
0:09:02 | to the previous classifier |
---|
0:09:04 | so we are using the previous training without as the starting point for the search for all occurrence classifier |
---|
0:09:12 | and uh that is |
---|
0:09:13 | makes the algorithm faster which i will show you later in the experiment |
---|
0:09:17 | a out and it also encodes the past memory |
---|
0:09:21 | um which um makes them |
---|
0:09:24 | um which makes the algorithm |
---|
0:09:26 | um |
---|
0:09:27 | have to |
---|
0:09:28 | oh information of the past |
---|
0:09:30 | and uh |
---|
0:09:31 | another modification is |
---|
0:09:33 | instead of letting the algorithm terminates in each that |
---|
0:09:36 | um we make uh we lead it terminate in i'm max |
---|
0:09:40 | steps so i max mediates the past to memory |
---|
0:09:44 | um was done current data |
---|
0:09:46 | so if i max E cost |
---|
0:09:48 | you to H then um no matter where you your start point as |
---|
0:09:52 | um we don't have the a we don't have any memories of the past well only in training on the |
---|
0:09:57 | current data we have |
---|
0:09:59 | if i max i it's less then H then |
---|
0:10:02 | it has |
---|
0:10:02 | the partial past memories |
---|
0:10:04 | um of the previous training |
---|
0:10:09 | um so we |
---|
0:10:10 | in a paper we also show that compared with stuff partially squares |
---|
0:10:14 | if we have the pose |
---|
0:10:16 | a we have the same um zero initialization then i'll what was them |
---|
0:10:20 | um |
---|
0:10:21 | yeah know i in schools taps |
---|
0:10:23 | with partial partially squares algorithm |
---|
0:10:26 | um |
---|
0:10:26 | which means |
---|
0:10:28 | and the same initialization our algorithm can have a comparable performance as the partial square |
---|
0:10:36 | um so now i we show you some test results we have done |
---|
0:10:40 | um we tested um |
---|
0:10:42 | these three algorithms the first is our proposed real-time conjugate gradient L with them |
---|
0:10:47 | and the second one is to a partially squares applied to the window size data and the third one is |
---|
0:10:52 | to traditional bridge partially square apply to the we know |
---|
0:10:56 | site data |
---|
0:10:57 | in we test it on three synthetic data and three real fmri data sect |
---|
0:11:02 | for the for synthetic dataset we generate two hundred examples each of dimension two sows and |
---|
0:11:08 | and which choose two sets of features each of dimension a hundred |
---|
0:11:13 | so when the label is one we |
---|
0:11:15 | said um one |
---|
0:11:16 | of the features that too |
---|
0:11:18 | have value one and when the label with minus one which choose another the other features that |
---|
0:11:23 | and the label is a repeating pattern of one and minus one |
---|
0:11:27 | and we as some Z noise |
---|
0:11:30 | and the second synthetic data we um it's |
---|
0:11:33 | it |
---|
0:11:34 | we generate the |
---|
0:11:35 | and a similar way as the first one but just we randomise the labels |
---|
0:11:40 | and for the certain one um we designed this to test them |
---|
0:11:44 | adaptive nets of our |
---|
0:11:45 | um L words them so |
---|
0:11:47 | um in for each um a hundred and fifty example where using a new model to generate the data |
---|
0:11:55 | and for the fmri data test |
---|
0:11:58 | the first |
---|
0:11:59 | task is of visual perception task |
---|
0:12:02 | uh so we show the subject in this skin they're of actually uh a |
---|
0:12:06 | a in about |
---|
0:12:08 | which is either on the left side or on the right side |
---|
0:12:11 | so when to chuck about it on the left |
---|
0:12:13 | then the right part of the visual cortex of the subject will be activated |
---|
0:12:17 | and vice versa |
---|
0:12:19 | um and uh the label of which is the position of the child about |
---|
0:12:24 | um is there are repeating pattern of |
---|
0:12:27 | um left and the right and do we have |
---|
0:12:30 | um |
---|
0:12:31 | uh and we have a day a point which is |
---|
0:12:33 | um |
---|
0:12:34 | of dimension mentioned about a hundred and twenty two cells and every three seconds |
---|
0:12:39 | and the second data test is similar to the first one |
---|
0:12:42 | i except that we minimize the label |
---|
0:12:46 | and the certain one we use a we used a public available dataset |
---|
0:12:50 | which is published in a two thousand signs paper by |
---|
0:12:54 | S |
---|
0:12:54 | and it is that can't worry related object vision task |
---|
0:12:58 | so we take ten rounds of one subject |
---|
0:13:00 | from the data |
---|
0:13:02 | so basically they're show being either a face image to or a house image to the subject in the scanner |
---|
0:13:08 | and uh each data point is of dimension about a hundred and sixty three cells and |
---|
0:13:15 | so this is the test results of the three algorithms |
---|
0:13:19 | um so here we showed the um prediction accuracy and also the average training time for each algorithm |
---|
0:13:25 | um |
---|
0:13:26 | and as you can see |
---|
0:13:28 | um our our um a among these three algorithms oh algorithm them is |
---|
0:13:32 | um |
---|
0:13:33 | always the fastest just one |
---|
0:13:35 | and uh in most cases it um the |
---|
0:13:38 | uh our with them have a higher accuracy than the other two |
---|
0:13:42 | and the only case it doesn't do as good as the other two is the think that a data three |
---|
0:13:47 | which is |
---|
0:13:48 | um the data when we change the model which generates the yeah and later i will show you how we |
---|
0:13:55 | can improve this out |
---|
0:13:56 | so that our algorithm can have a comparable |
---|
0:13:59 | um performance as the other two |
---|
0:14:02 | um |
---|
0:14:04 | so this is a |
---|
0:14:06 | um results um this is the prediction of out of the synthetic dataset |
---|
0:14:11 | the black line here is |
---|
0:14:13 | um |
---|
0:14:13 | is still label of its the true label of the example with either one or minus one |
---|
0:14:19 | and the blue line is the prediction out |
---|
0:14:22 | as you can see it fits nicely with the true label |
---|
0:14:25 | and uh there is a global learning curve you can see on the uh on the plots |
---|
0:14:30 | because that is because we choose the previous |
---|
0:14:34 | train you results as the starting point which encodes the memory |
---|
0:14:37 | so that the algorithm gets more and more confident |
---|
0:14:40 | when it sees more and more examples |
---|
0:14:45 | and um |
---|
0:14:47 | uh |
---|
0:14:48 | and on the right side uh on the right is the prediction um plot for the synthetic data |
---|
0:14:54 | sets three |
---|
0:14:55 | um |
---|
0:14:56 | so the reason why i why with them doesn't do ask good as the other two in uh |
---|
0:15:01 | model changing on context it's because we have uh our them has the past memory |
---|
0:15:06 | um so when we think the same model the memory can help you to learn faster but if you change |
---|
0:15:12 | modelled |
---|
0:15:13 | the memory of the past actually hurts you |
---|
0:15:16 | um |
---|
0:15:17 | so it's the tradeoff between the memory and adaptive this |
---|
0:15:21 | um |
---|
0:15:23 | in this is the um |
---|
0:15:25 | prediction with out on the |
---|
0:15:27 | real fmri to |
---|
0:15:30 | and uh |
---|
0:15:32 | um |
---|
0:15:32 | as i said |
---|
0:15:33 | um i will show you are and that |
---|
0:15:36 | a a good starting point for the them really matters |
---|
0:15:39 | because if we look at the rate um training uh the residual in the training phase |
---|
0:15:43 | we can see that |
---|
0:15:45 | um at first we don't have any memory of the past so the residual is very high |
---|
0:15:49 | but with being ten to twenty eight pose |
---|
0:15:52 | using um |
---|
0:15:53 | and the memory helps so as to reduce the residual um |
---|
0:15:58 | i from like five thousand two um all |
---|
0:16:02 | um zero point one percent of the initial ish but of the |
---|
0:16:06 | um residue at the very for speaking |
---|
0:16:08 | so |
---|
0:16:09 | um |
---|
0:16:10 | and also we can use this information to improve our performance when the model changed |
---|
0:16:16 | because |
---|
0:16:17 | as you know is every |
---|
0:16:18 | time the model changed |
---|
0:16:20 | the residual um becomes high again so by detecting the said and change of the residual we can uh make |
---|
0:16:27 | the model to of get all the path |
---|
0:16:30 | um |
---|
0:16:30 | or the past memories and the start over again |
---|
0:16:33 | so we can |
---|
0:16:34 | um improve our were um |
---|
0:16:37 | but improve the performance of our with them so that it can has them |
---|
0:16:41 | um can |
---|
0:16:41 | it can have a comp arable i'm performance as the other two |
---|
0:16:47 | so for some future work |
---|
0:16:49 | um so right now we are are only um generate the prediction was that's but we and we have use |
---|
0:16:55 | it to um as our real feedback to the experiment or the sub |
---|
0:16:59 | oh the subject |
---|
0:17:00 | so uh in in next step where |
---|
0:17:03 | considering using that information |
---|
0:17:05 | um to be would like a brain computer interface |
---|
0:17:08 | and the we also want to try more complicated except experiments and we also want to compare with other |
---|
0:17:15 | um real time algorithms out there |
---|
0:17:18 | so um i think you |
---|
0:17:19 | and uh i like to take some questions if there |
---|
0:17:30 | so |
---|
0:17:31 | question |
---|
0:17:34 | on something |
---|
0:17:35 | work |
---|
0:17:38 | no |
---|
0:17:40 | so uh |
---|
0:17:42 | uh uh a week to P the uh right don't guy i'll go was all used to use the precondition |
---|
0:17:49 | the so |
---|
0:17:50 | have you think about that and it is possible to include that in you |
---|
0:17:54 | i'll go with a |
---|
0:17:57 | we do we use a precondition mouse since C is much to apply at the beginning and it's them crime |
---|
0:18:02 | of normalisation "'em" of the matrix |
---|
0:18:05 | so on |
---|
0:18:06 | do any precondition |
---|
0:18:08 | um but |
---|
0:18:10 | you like |
---|
0:18:11 | this thing to make a |
---|
0:18:13 | each renting or |
---|
0:18:15 | you can and do the um |
---|
0:18:16 | um |
---|
0:18:17 | but it cost |
---|
0:18:19 | well |
---|
0:18:19 | great uh |
---|
0:18:24 | okay |
---|
0:18:26 | thank you |
---|
0:18:29 | so |
---|
0:18:30 | hmmm |
---|