0:00:13 | hi |
---|
0:00:14 | uh i'm for each of which you can reach from the machine learning group at technical university in berlin |
---|
0:00:19 | and i would present |
---|
0:00:20 | you a lotta recent work about stationary common patterns |
---|
0:00:24 | this is joint work with common be dora and able to a key cover now |
---|
0:00:31 | so here is an overview |
---|
0:00:32 | i would start with an introduction |
---|
0:00:35 | it's and tell you something about the common spatial patterns method |
---|
0:00:39 | and i was stationary this common spatial map headers method |
---|
0:00:43 | then i was show some results |
---|
0:00:46 | and concludes that all of a summary |
---|
0:00:51 | so our target application is brain computer interfacing |
---|
0:00:55 | and the brain computer interface system |
---|
0:00:57 | aims to translate the intent of a subject |
---|
0:01:00 | for example measure |
---|
0:01:01 | from brain activity |
---|
0:01:03 | you're in this case by E G |
---|
0:01:06 | into account for common for a computer application |
---|
0:01:09 | so it in is this case you a measure ring E G and |
---|
0:01:12 | you want to control those games is pinball game |
---|
0:01:15 | but you can also think of other applications like um |
---|
0:01:19 | controlling a wheelchair or a new row proceed |
---|
0:01:25 | so a very popular paradigm |
---|
0:01:27 | from uh for bci is motor imagery |
---|
0:01:30 | and motor imagery |
---|
0:01:32 | the subject |
---|
0:01:34 | imagine some motions with the right hand towards the left hand towards the feet |
---|
0:01:40 | and is this different emotions lead to different |
---|
0:01:42 | different patterns in C G |
---|
0:01:45 | and if your system is able to extract and classify this different patterns |
---|
0:01:50 | then you can come compared to a computer comment and control an application like |
---|
0:01:58 | so there are still some challenges |
---|
0:02:00 | so for example the E G signal is usually high dimension uh |
---|
0:02:04 | it has a lower spatial resolution |
---|
0:02:07 | that means you have a volume conduction effect |
---|
0:02:10 | and sit this noisy and non-stationary |
---|
0:02:14 | minus one stationary i mean that's is that signal properties change over time |
---|
0:02:20 | so what usually people do in bci as they apply some efforts |
---|
0:02:24 | uh some spatial filtering method |
---|
0:02:27 | for example the csp |
---|
0:02:29 | in order to reduce the dimensionality |
---|
0:02:33 | so it's of the goal is to combine electrodes and to like to project a signal to a |
---|
0:02:37 | to a subspace |
---|
0:02:38 | and increase the spatial resolution and hopefully the signal-to-noise ratio |
---|
0:02:43 | and simplified the learning problem |
---|
0:02:47 | but the problem of csp is that |
---|
0:02:49 | it's |
---|
0:02:49 | it's prone to overfitting and it's can be negatively affected by artifacts |
---|
0:02:55 | and |
---|
0:02:55 | it doesn't tech as a non tissue issue that means |
---|
0:02:58 | if you if your computer features |
---|
0:03:00 | applying csp |
---|
0:03:02 | then the features may still change |
---|
0:03:05 | quite a bit and |
---|
0:03:06 | and usually you classifier assumes |
---|
0:03:09 | a stable distributions so in machine learning to usually the else assume |
---|
0:03:12 | that's a |
---|
0:03:13 | training data and the three test data are comes from the same distribution and if you if you data if |
---|
0:03:18 | should distribution change too much |
---|
0:03:21 | then you it doesn't work so the classifier |
---|
0:03:23 | um |
---|
0:03:24 | we're not work |
---|
0:03:25 | all optimal |
---|
0:03:28 | so therefore we extend |
---|
0:03:30 | the csp my thought |
---|
0:03:31 | um |
---|
0:03:32 | and extract most stationary feature |
---|
0:03:37 | or like non-stationary at changes of the signal properties of a time |
---|
0:03:42 | and same may have very different sources and time scale |
---|
0:03:45 | for example |
---|
0:03:46 | you you may have changes in the and X road input then as |
---|
0:03:50 | when the electrodes gets lose all the gel between the scout and the electrode dries out |
---|
0:03:57 | you may also have muscular activity an eye movements |
---|
0:04:01 | they made it to artifacts in the data |
---|
0:04:04 | and |
---|
0:04:05 | usually also have a |
---|
0:04:07 | changes in task involve so when subjects could tired |
---|
0:04:11 | all differences between sessions |
---|
0:04:13 | so what i can no feedback conditions the calibration session whereas |
---|
0:04:17 | in the if pick session you provides |
---|
0:04:21 | so |
---|
0:04:22 | basically all those non stationarities |
---|
0:04:25 | a a bad for you because uh as the negative negatively |
---|
0:04:28 | at um affect you classifier |
---|
0:04:31 | and so there are two ways to deal with this you can |
---|
0:04:33 | one way is to extract better features to make your features more troubles and more invariant to this changes |
---|
0:04:39 | does this is the way we um we propose an our paper our we |
---|
0:04:44 | target of our paper |
---|
0:04:45 | the other way is to do adaptation so you can adapt the classifier to double will sustain change |
---|
0:04:55 | okay so a |
---|
0:04:56 | common spatial patterns methods |
---|
0:04:58 | it's and i thought we very popular and brain computer interfacing and and it maximises |
---|
0:05:04 | the variance from one class while minimizing the variance of the other class |
---|
0:05:09 | so we if you're you have like to conditions you imagine you have the imagination of the movement of the |
---|
0:05:14 | right hand and the left hand |
---|
0:05:17 | and a you you see that these two guys uh down here think maximise the variance of the signal now |
---|
0:05:23 | to the project signal the maximizer in the |
---|
0:05:26 | uh right hand |
---|
0:05:27 | uh condition but minimize the and the |
---|
0:05:30 | left hand condition |
---|
0:05:31 | and the two guys a off they do exactly the opposite so them the maximise the variance in the left |
---|
0:05:36 | condition but many in the right condition |
---|
0:05:40 | so |
---|
0:05:40 | why do we want to do so like in in B C i U |
---|
0:05:44 | goal is to discriminate between mental states |
---|
0:05:48 | and um |
---|
0:05:49 | you know that the variance of a band has filtered signal is equal to band power |
---|
0:05:55 | in is it's frequency but |
---|
0:05:57 | so and in you can discriminate mental state |
---|
0:06:02 | and by looking at the power in the specific frequency bands |
---|
0:06:06 | so when we need to sell |
---|
0:06:08 | um you can easily |
---|
0:06:09 | um detect changes uh between the conditions because you're you're looking at the bed power is finally you are looking |
---|
0:06:16 | at the bed power one specific frequency band a band |
---|
0:06:21 | and the csp can be solved as uh |
---|
0:06:23 | generalized eigenvalue problem because |
---|
0:06:26 | like you can formulate a garrison |
---|
0:06:29 | here so you want to maximise |
---|
0:06:31 | um |
---|
0:06:32 | this |
---|
0:06:33 | you want to maximise the project variance of one condition |
---|
0:06:36 | while minimizing the the variance of the common conditional |
---|
0:06:41 | equally you can also right here you want to minimize the variance of the other condition |
---|
0:06:45 | of |
---|
0:06:46 | sigma minus |
---|
0:06:48 | so we can solve this very easy |
---|
0:06:51 | it might not work |
---|
0:06:53 | but our idea is |
---|
0:06:55 | um we |
---|
0:06:56 | do not only want the projection |
---|
0:06:58 | which uh which has this properties but we also want that's a projection |
---|
0:07:03 | um |
---|
0:07:04 | if |
---|
0:07:05 | provide stationary features so we want to penalise non-stationary projection type attack directions |
---|
0:07:11 | so we introduce the penalty if |
---|
0:07:13 | P of W |
---|
0:07:14 | two than denominator also really cool of course for coefficient |
---|
0:07:19 | you're |
---|
0:07:19 | so we add this |
---|
0:07:20 | P of W |
---|
0:07:22 | here |
---|
0:07:22 | and then the final goal is to like to |
---|
0:07:26 | uh to maximise the project variance one condition while minimizing the variance in the other condition and |
---|
0:07:33 | minimizing this |
---|
0:07:34 | P a penalty term |
---|
0:07:39 | so |
---|
0:07:39 | the penalty term measures somehow non stationarities |
---|
0:07:43 | so we want to measure the the deviation |
---|
0:07:46 | between the average case so this is |
---|
0:07:49 | the sigma C is the average |
---|
0:07:51 | matrix of all trials from conditions C |
---|
0:07:55 | um the one condition |
---|
0:07:56 | and uh the can mark K C is the |
---|
0:08:00 | uh as |
---|
0:08:01 | the covariance matrix from the cape chunk a channel maybe |
---|
0:08:05 | may consist of one trial or more than one trials from the same cloth |
---|
0:08:09 | so |
---|
0:08:10 | you want to kind of |
---|
0:08:11 | to minimize the |
---|
0:08:13 | and the deviation from the from each trial |
---|
0:08:17 | of |
---|
0:08:18 | to the to the average case |
---|
0:08:20 | so this is like |
---|
0:08:21 | i don't turn because you want to be stationary |
---|
0:08:24 | in for for each class separately so you want to do it for each method |
---|
0:08:29 | hmmm |
---|
0:08:30 | yeah so the problem is if you |
---|
0:08:32 | and this quantity to the denominator |
---|
0:08:35 | then |
---|
0:08:36 | uh |
---|
0:08:37 | you want to get this form anymore because you cannot take out as W C outside to some |
---|
0:08:42 | because of this uh |
---|
0:08:44 | absolute value function here |
---|
0:08:46 | so you you want the egg to solve it as the generalized eigenvalue problem anymore |
---|
0:08:53 | so what |
---|
0:08:54 | what do we do about this we add a quantity which is related |
---|
0:08:58 | so we take this W vector outside |
---|
0:09:02 | the sum |
---|
0:09:03 | but introduce an operator F |
---|
0:09:05 | to make this difference matrix |
---|
0:09:07 | the to be positive definite |
---|
0:09:09 | because we are only interested in |
---|
0:09:12 | like in in the |
---|
0:09:13 | we don't |
---|
0:09:14 | win the variation |
---|
0:09:16 | the of both sides and three that in the similar way so we we do not care if |
---|
0:09:20 | like for example here we we do not care if this guy is big are |
---|
0:09:23 | oh this guy's bigger we are only interested in the difference after projection |
---|
0:09:28 | but |
---|
0:09:28 | here |
---|
0:09:29 | uh we do kind of the same but |
---|
0:09:32 | um |
---|
0:09:34 | we do this before projecting so we we do not do this after projecting up because we take this W |
---|
0:09:40 | outside the sum |
---|
0:09:41 | and we can also show that |
---|
0:09:43 | is this quantity gives an upper bound |
---|
0:09:46 | of the other quantity which we want that's |
---|
0:09:48 | to minimize |
---|
0:09:50 | with |
---|
0:09:50 | make sense to use it |
---|
0:09:53 | so we put this guy and the rayleigh coefficient of our objective function |
---|
0:09:58 | so a lot data set is |
---|
0:10:00 | we compare |
---|
0:10:01 | C S P and S E S P on the data set of at at subjects |
---|
0:10:05 | the foaming a motion meant three |
---|
0:10:08 | say when you to B C i so they did that for the first time |
---|
0:10:12 | we selected for each user as a best |
---|
0:10:14 | binary task combination and the that's parameters on the calibration data |
---|
0:10:20 | and we we |
---|
0:10:21 | we this song testing |
---|
0:10:24 | but test session with feedback back |
---|
0:10:26 | with three hundred trials |
---|
0:10:28 | we record that's so i E G from sixty eight three select |
---|
0:10:32 | electrodes |
---|
0:10:33 | and use log variance feature and the net the egg classifier uh and error rates to measure up performance |
---|
0:10:40 | we use a fixed number of fit respect class |
---|
0:10:45 | and select is the trade of parameter |
---|
0:10:48 | uh |
---|
0:10:49 | with cross validation and we also tried different chunk size a |
---|
0:10:53 | and select it's the best one also by a cross validation |
---|
0:10:57 | on the calibration date |
---|
0:11:00 | so if as some performance results that you had you see the scatter plots when using three csp directions back |
---|
0:11:07 | counts |
---|
0:11:08 | or using one csp direction class |
---|
0:11:10 | on the X axis used |
---|
0:11:11 | the error rate of |
---|
0:11:13 | csp P and on the Y is error rate of |
---|
0:11:17 | our approach |
---|
0:11:18 | and you can you can see that especially specially for subjects which |
---|
0:11:22 | a which fayer when using csp P like these guys they calm really better |
---|
0:11:27 | when with our method and |
---|
0:11:29 | that's the same as can be seen here |
---|
0:11:32 | and we compute that's um |
---|
0:11:34 | test statistic and the changes a significance our method works better especially for the subjects |
---|
0:11:41 | the which have |
---|
0:11:42 | a red light uh larger than thirty percent |
---|
0:11:45 | so we we can improve in those cases which which fail in when using |
---|
0:11:49 | csp we just somehow clear because if |
---|
0:11:52 | it's csp works |
---|
0:11:54 | well |
---|
0:11:54 | then you're |
---|
0:11:55 | patterns are probably really really good in the signal to noise ratio |
---|
0:11:59 | it's good so you do not have a lot of room to improve it |
---|
0:12:04 | but um |
---|
0:12:06 | as so the question is why does |
---|
0:12:07 | as C S P perform better |
---|
0:12:10 | a basically we know that's csp may fail to extract the current patterns when effective by defect |
---|
0:12:17 | and |
---|
0:12:18 | as you saw |
---|
0:12:19 | stationary csp P |
---|
0:12:21 | it's more robust to as artifacts because it treats artifacts as non-stationary |
---|
0:12:25 | nonstationary |
---|
0:12:27 | and it's we uses as non-stationary in the features |
---|
0:12:31 | and C S P is also known to all buffet |
---|
0:12:33 | and as csp S P at |
---|
0:12:35 | you know like this fit with lots not |
---|
0:12:39 | and produces more it's red uses changes and the features |
---|
0:12:43 | so for example you hear you see um |
---|
0:12:45 | the the result that subject performing |
---|
0:12:48 | left and right to motion imagery |
---|
0:12:50 | you see that both methods uh a but to extract the colour correct left hand that are |
---|
0:12:56 | so there activity of the on the right hemisphere this means that |
---|
0:13:00 | um it's the pattern for the left hand motion imagery |
---|
0:13:04 | but in the |
---|
0:13:05 | pose the right hand the csp method fayer |
---|
0:13:08 | because probably in this electrodes there is an artifact of the um |
---|
0:13:12 | this is an four gives the noise the signal all that signal |
---|
0:13:16 | uh it's |
---|
0:13:17 | kind of nonstationary |
---|
0:13:18 | and but |
---|
0:13:19 | scs piece |
---|
0:13:21 | if they're a bit affected by this |
---|
0:13:22 | artifacts as this electrode but it's |
---|
0:13:25 | it's a but to |
---|
0:13:26 | strike the |
---|
0:13:27 | more less correct header of the |
---|
0:13:30 | right hand |
---|
0:13:32 | and you also see here when you look at the distribution between |
---|
0:13:36 | uh training feature as and test features |
---|
0:13:39 | training features uh |
---|
0:13:40 | uh |
---|
0:13:41 | of the triangles and test features of the circles |
---|
0:13:44 | so you see that the distribution is the training phase of |
---|
0:13:47 | S S of P |
---|
0:13:49 | look this |
---|
0:13:50 | usually like like here |
---|
0:13:52 | but it changes a lot when when you go to the test distribution when when you when you look at |
---|
0:13:57 | the test features |
---|
0:13:58 | so that |
---|
0:13:59 | the distribution is completely difference in the test |
---|
0:14:02 | that's case |
---|
0:14:04 | but um |
---|
0:14:05 | when we use C S P we extract most stable features most stationary features |
---|
0:14:10 | so the the distribution between training and |
---|
0:14:14 | and test phase |
---|
0:14:15 | is um |
---|
0:14:16 | it's more less the same |
---|
0:14:17 | so you you can classify in this case to think that if i a lot better |
---|
0:14:21 | so here's the decision boundary and to see that |
---|
0:14:25 | a in that that have a case you really fail |
---|
0:14:27 | to classify |
---|
0:14:28 | a correct you here |
---|
0:14:32 | okay so in summary |
---|
0:14:34 | re |
---|
0:14:34 | extend that's a popular csp method |
---|
0:14:38 | to extract stationary features |
---|
0:14:41 | a S P significantly increase the classification a if especially for subjects |
---|
0:14:47 | we perform badly with |
---|
0:14:49 | csp |
---|
0:14:50 | and unlike other methods like invariant csp |
---|
0:14:53 | we are completely data-driven |
---|
0:14:56 | we do not require additional recordings or models of the expected changes |
---|
0:15:02 | and we also showed that it was not presented in this paper that the combination of stationary features and |
---|
0:15:09 | unsupervised adaptation can further improve classification performance |
---|
0:15:15 | so i want to thank you for your attention |
---|
0:15:18 | we have to and |
---|
0:15:37 | um can you explain more details about um uh |
---|
0:15:41 | dot function yeah |
---|
0:15:43 | in in our town |
---|
0:15:47 | you mean um |
---|
0:15:49 | yeah so the function just one yeah |
---|
0:15:51 | so this function F is the set but it's kind of a heuristic because it makes |
---|
0:15:55 | you're metrics this difference metrics makes it |
---|
0:15:58 | positive |
---|
0:15:59 | definite |
---|
0:16:00 | so it means it's flits |
---|
0:16:01 | the sign of all the negative eigenvalue |
---|
0:16:04 | and it's as i |
---|
0:16:06 | why you want to do so because |
---|
0:16:07 | um |
---|
0:16:09 | we want to use some you what you want to sound of K |
---|
0:16:12 | of possible value a positive value so you want to |
---|
0:16:15 | of for example here you some of like |
---|
0:16:17 | oh what okay of possible uh a positive deviations |
---|
0:16:22 | and you kind of want to do the same here |
---|
0:16:25 | so you make this met the difference metrics positive definite |
---|
0:16:28 | and then we can show that this is an upper bound |
---|
0:16:30 | on on the other quantity |
---|
0:16:32 | so so here you did yeah on the operation dot um duh free to sign on the whole new eigen |
---|
0:16:39 | brazil has you and the expanding this right |
---|
0:16:42 | uh |
---|
0:16:43 | so what are we with computers difference metric then we do a eigen decomposition uh_huh and then flipped uh uh |
---|
0:16:48 | the sign of or negative eigenvalues |
---|
0:16:52 | okay so you keep on the positive ones unpleasantly |
---|
0:16:55 | yeah |
---|
0:16:55 | okay |
---|
0:16:56 | an exit that they're actually i |
---|
0:16:58 | eigen vectors like the directions are kind of this |
---|
0:17:02 | flipped |
---|
0:17:02 | or like when you have a |
---|
0:17:04 | eigenvector with a negative |
---|
0:17:06 | eigenvalues and you few flip it |
---|
0:17:08 | simply but you do not like |
---|
0:17:10 | change a lot but you only flip it |
---|
0:17:11 | because you are only interested in positive contributions |
---|
0:17:14 | yeah yeah |
---|
0:17:15 | okay |
---|
0:17:15 | thing |
---|
0:17:20 | oh uh while you're |
---|
0:17:23 | you know i need a lead to the chunks |
---|
0:17:25 | you know uh really all you have some |
---|
0:17:28 | uh |
---|
0:17:29 | no particle you can use clustering to find some similarities as well no you you you can you can simply |
---|
0:17:35 | use |
---|
0:17:36 | the channel size of one that means that you use |
---|
0:17:38 | each trial |
---|
0:17:40 | that each trial is enters the channel |
---|
0:17:42 | you can do for example this we can do this uh try to wise |
---|
0:17:46 | well you can put |
---|
0:17:47 | the |
---|
0:17:48 | trials from the same class which a subsequent |
---|
0:17:51 | together in one chunk |
---|
0:17:52 | so we do not apply any for clustering we only like put some together |
---|
0:17:57 | overall we we do it for each trial separate |
---|
0:18:06 | my question about your |
---|
0:18:09 | yeah money consuming and that at different me |
---|
0:18:14 | no this is was only one uh one one test |
---|
0:18:17 | session |
---|
0:18:18 | okay |
---|
0:18:23 | uh the question what the clustering of the chunk sizes |
---|
0:18:26 | so if you |
---|
0:18:27 | if you use the chunk size which is not a than one would you could |
---|
0:18:30 | the look |
---|
0:18:31 | average old part of you know and stationarity |
---|
0:18:35 | and yeah so this is what this was the idea to use chunk sizes because |
---|
0:18:39 | with you use chunk size of one then you like detect |
---|
0:18:43 | the changes on a small uh times K |
---|
0:18:46 | if you take that |
---|
0:18:47 | chunk sizes then |
---|
0:18:49 | you time scale |
---|
0:18:50 | we also be bigger because we average out the changes which only a curve for example in one trial |
---|
0:18:56 | so we we tried different |
---|
0:18:57 | chunk sizes and like select is the best one using cross-validation |
---|
0:19:06 | oh |
---|