Přepis řeči - STATIONARY COMMON SPATIAL PATTERNS: TOWARDS ROBUST CLASSIFICATION OF NON-STATIONARY EEG SIGNALS

0:00:13	hi
0:00:14	uh i'm for each of which you can reach from the machine learning group at technical university in berlin
0:00:19	and i would present
0:00:20	you a lotta recent work about stationary common patterns
0:00:24	this is joint work with common be dora and able to a key cover now
0:00:31	so here is an overview
0:00:32	i would start with an introduction
0:00:35	it's and tell you something about the common spatial patterns method
0:00:39	and i was stationary this common spatial map headers method
0:00:43	then i was show some results
0:00:46	and concludes that all of a summary
0:00:51	so our target application is brain computer interfacing
0:00:55	and the brain computer interface system
0:00:57	aims to translate the intent of a subject
0:01:00	for example measure
0:01:01	from brain activity
0:01:03	you're in this case by E G
0:01:06	into account for common for a computer application
0:01:09	so it in is this case you a measure ring E G and
0:01:12	you want to control those games is pinball game
0:01:15	but you can also think of other applications like um
0:01:19	controlling a wheelchair or a new row proceed
0:01:25	so a very popular paradigm
0:01:27	from uh for bci is motor imagery
0:01:30	and motor imagery
0:01:32	the subject
0:01:34	imagine some motions with the right hand towards the left hand towards the feet
0:01:40	and is this different emotions lead to different
0:01:42	different patterns in C G
0:01:45	and if your system is able to extract and classify this different patterns
0:01:50	then you can come compared to a computer comment and control an application like
0:01:58	so there are still some challenges
0:02:00	so for example the E G signal is usually high dimension uh
0:02:04	it has a lower spatial resolution
0:02:07	that means you have a volume conduction effect
0:02:10	and sit this noisy and non-stationary
0:02:14	minus one stationary i mean that's is that signal properties change over time
0:02:20	so what usually people do in bci as they apply some efforts
0:02:24	uh some spatial filtering method
0:02:27	for example the csp
0:02:29	in order to reduce the dimensionality
0:02:33	so it's of the goal is to combine electrodes and to like to project a signal to a
0:02:37	to a subspace
0:02:38	and increase the spatial resolution and hopefully the signal-to-noise ratio
0:02:43	and simplified the learning problem
0:02:47	but the problem of csp is that
0:02:49	it's
0:02:49	it's prone to overfitting and it's can be negatively affected by artifacts
0:02:55	and
0:02:55	it doesn't tech as a non tissue issue that means
0:02:58	if you if your computer features
0:03:00	applying csp
0:03:02	then the features may still change
0:03:05	quite a bit and
0:03:06	and usually you classifier assumes
0:03:09	a stable distributions so in machine learning to usually the else assume
0:03:12	that's a
0:03:13	training data and the three test data are comes from the same distribution and if you if you data if
0:03:18	should distribution change too much
0:03:21	then you it doesn't work so the classifier
0:03:23	um
0:03:24	we're not work
0:03:25	all optimal
0:03:28	so therefore we extend
0:03:30	the csp my thought
0:03:31	um
0:03:32	and extract most stationary feature
0:03:37	or like non-stationary at changes of the signal properties of a time
0:03:42	and same may have very different sources and time scale
0:03:45	for example
0:03:46	you you may have changes in the and X road input then as
0:03:50	when the electrodes gets lose all the gel between the scout and the electrode dries out
0:03:57	you may also have muscular activity an eye movements
0:04:01	they made it to artifacts in the data
0:04:04	and
0:04:05	usually also have a
0:04:07	changes in task involve so when subjects could tired
0:04:11	all differences between sessions
0:04:13	so what i can no feedback conditions the calibration session whereas
0:04:17	in the if pick session you provides
0:04:21	so
0:04:22	basically all those non stationarities
0:04:25	a a bad for you because uh as the negative negatively
0:04:28	at um affect you classifier
0:04:31	and so there are two ways to deal with this you can
0:04:33	one way is to extract better features to make your features more troubles and more invariant to this changes
0:04:39	does this is the way we um we propose an our paper our we
0:04:44	target of our paper
0:04:45	the other way is to do adaptation so you can adapt the classifier to double will sustain change
0:04:55	okay so a
0:04:56	common spatial patterns methods
0:04:58	it's and i thought we very popular and brain computer interfacing and and it maximises
0:05:04	the variance from one class while minimizing the variance of the other class
0:05:09	so we if you're you have like to conditions you imagine you have the imagination of the movement of the
0:05:14	right hand and the left hand
0:05:17	and a you you see that these two guys uh down here think maximise the variance of the signal now
0:05:23	to the project signal the maximizer in the
0:05:26	uh right hand
0:05:27	uh condition but minimize the and the
0:05:30	left hand condition
0:05:31	and the two guys a off they do exactly the opposite so them the maximise the variance in the left
0:05:36	condition but many in the right condition
0:05:40	so
0:05:40	why do we want to do so like in in B C i U
0:05:44	goal is to discriminate between mental states
0:05:48	and um
0:05:49	you know that the variance of a band has filtered signal is equal to band power
0:05:55	in is it's frequency but
0:05:57	so and in you can discriminate mental state
0:06:02	and by looking at the power in the specific frequency bands
0:06:06	so when we need to sell
0:06:08	um you can easily
0:06:09	um detect changes uh between the conditions because you're you're looking at the bed power is finally you are looking
0:06:16	at the bed power one specific frequency band a band
0:06:21	and the csp can be solved as uh
0:06:23	generalized eigenvalue problem because
0:06:26	like you can formulate a garrison
0:06:29	here so you want to maximise
0:06:31	um
0:06:32	this
0:06:33	you want to maximise the project variance of one condition
0:06:36	while minimizing the the variance of the common conditional
0:06:41	equally you can also right here you want to minimize the variance of the other condition
0:06:45	of
0:06:46	sigma minus
0:06:48	so we can solve this very easy
0:06:51	it might not work
0:06:53	but our idea is
0:06:55	um we
0:06:56	do not only want the projection
0:06:58	which uh which has this properties but we also want that's a projection
0:07:03	um
0:07:04	if
0:07:05	provide stationary features so we want to penalise non-stationary projection type attack directions
0:07:11	so we introduce the penalty if
0:07:13	P of W
0:07:14	two than denominator also really cool of course for coefficient
0:07:19	you're
0:07:19	so we add this
0:07:20	P of W
0:07:22	here
0:07:22	and then the final goal is to like to
0:07:26	uh to maximise the project variance one condition while minimizing the variance in the other condition and
0:07:33	minimizing this
0:07:34	P a penalty term
0:07:39	so
0:07:39	the penalty term measures somehow non stationarities
0:07:43	so we want to measure the the deviation
0:07:46	between the average case so this is
0:07:49	the sigma C is the average
0:07:51	matrix of all trials from conditions C
0:07:55	um the one condition
0:07:56	and uh the can mark K C is the
0:08:00	uh as
0:08:01	the covariance matrix from the cape chunk a channel maybe
0:08:05	may consist of one trial or more than one trials from the same cloth
0:08:09	so
0:08:10	you want to kind of
0:08:11	to minimize the
0:08:13	and the deviation from the from each trial
0:08:17	of
0:08:18	to the to the average case
0:08:20	so this is like
0:08:21	i don't turn because you want to be stationary
0:08:24	in for for each class separately so you want to do it for each method
0:08:29	hmmm
0:08:30	yeah so the problem is if you
0:08:32	and this quantity to the denominator
0:08:35	then
0:08:36	uh
0:08:37	you want to get this form anymore because you cannot take out as W C outside to some
0:08:42	because of this uh
0:08:44	absolute value function here
0:08:46	so you you want the egg to solve it as the generalized eigenvalue problem anymore
0:08:53	so what
0:08:54	what do we do about this we add a quantity which is related
0:08:58	so we take this W vector outside
0:09:02	the sum
0:09:03	but introduce an operator F
0:09:05	to make this difference matrix
0:09:07	the to be positive definite
0:09:09	because we are only interested in
0:09:12	like in in the
0:09:13	we don't
0:09:14	win the variation
0:09:16	the of both sides and three that in the similar way so we we do not care if
0:09:20	like for example here we we do not care if this guy is big are
0:09:23	oh this guy's bigger we are only interested in the difference after projection
0:09:28	but
0:09:28	here
0:09:29	uh we do kind of the same but
0:09:32	um
0:09:34	we do this before projecting so we we do not do this after projecting up because we take this W
0:09:40	outside the sum
0:09:41	and we can also show that
0:09:43	is this quantity gives an upper bound
0:09:46	of the other quantity which we want that's
0:09:48	to minimize
0:09:50	with
0:09:50	make sense to use it
0:09:53	so we put this guy and the rayleigh coefficient of our objective function
0:09:58	so a lot data set is
0:10:00	we compare
0:10:01	C S P and S E S P on the data set of at at subjects
0:10:05	the foaming a motion meant three
0:10:08	say when you to B C i so they did that for the first time
0:10:12	we selected for each user as a best
0:10:14	binary task combination and the that's parameters on the calibration data
0:10:20	and we we
0:10:21	we this song testing
0:10:24	but test session with feedback back
0:10:26	with three hundred trials
0:10:28	we record that's so i E G from sixty eight three select
0:10:32	electrodes
0:10:33	and use log variance feature and the net the egg classifier uh and error rates to measure up performance
0:10:40	we use a fixed number of fit respect class
0:10:45	and select is the trade of parameter
0:10:48	uh
0:10:49	with cross validation and we also tried different chunk size a
0:10:53	and select it's the best one also by a cross validation
0:10:57	on the calibration date
0:11:00	so if as some performance results that you had you see the scatter plots when using three csp directions back
0:11:07	counts
0:11:08	or using one csp direction class
0:11:10	on the X axis used
0:11:11	the error rate of
0:11:13	csp P and on the Y is error rate of
0:11:17	our approach
0:11:18	and you can you can see that especially specially for subjects which
0:11:22	a which fayer when using csp P like these guys they calm really better
0:11:27	when with our method and
0:11:29	that's the same as can be seen here
0:11:32	and we compute that's um
0:11:34	test statistic and the changes a significance our method works better especially for the subjects
0:11:41	the which have
0:11:42	a red light uh larger than thirty percent
0:11:45	so we we can improve in those cases which which fail in when using
0:11:49	csp we just somehow clear because if
0:11:52	it's csp works
0:11:54	well
0:11:54	then you're
0:11:55	patterns are probably really really good in the signal to noise ratio
0:11:59	it's good so you do not have a lot of room to improve it
0:12:04	but um
0:12:06	as so the question is why does
0:12:07	as C S P perform better
0:12:10	a basically we know that's csp may fail to extract the current patterns when effective by defect
0:12:17	and
0:12:18	as you saw
0:12:19	stationary csp P
0:12:21	it's more robust to as artifacts because it treats artifacts as non-stationary
0:12:25	nonstationary
0:12:27	and it's we uses as non-stationary in the features
0:12:31	and C S P is also known to all buffet
0:12:33	and as csp S P at
0:12:35	you know like this fit with lots not
0:12:39	and produces more it's red uses changes and the features
0:12:43	so for example you hear you see um
0:12:45	the the result that subject performing
0:12:48	left and right to motion imagery
0:12:50	you see that both methods uh a but to extract the colour correct left hand that are
0:12:56	so there activity of the on the right hemisphere this means that
0:13:00	um it's the pattern for the left hand motion imagery
0:13:04	but in the
0:13:05	pose the right hand the csp method fayer
0:13:08	because probably in this electrodes there is an artifact of the um
0:13:12	this is an four gives the noise the signal all that signal
0:13:16	uh it's
0:13:17	kind of nonstationary
0:13:18	and but
0:13:19	scs piece
0:13:21	if they're a bit affected by this
0:13:22	artifacts as this electrode but it's
0:13:25	it's a but to
0:13:26	strike the
0:13:27	more less correct header of the
0:13:30	right hand
0:13:32	and you also see here when you look at the distribution between
0:13:36	uh training feature as and test features
0:13:39	training features uh
0:13:40	uh
0:13:41	of the triangles and test features of the circles
0:13:44	so you see that the distribution is the training phase of
0:13:47	S S of P
0:13:49	look this
0:13:50	usually like like here
0:13:52	but it changes a lot when when you go to the test distribution when when you when you look at
0:13:57	the test features
0:13:58	so that
0:13:59	the distribution is completely difference in the test
0:14:02	that's case
0:14:04	but um
0:14:05	when we use C S P we extract most stable features most stationary features
0:14:10	so the the distribution between training and
0:14:14	and test phase
0:14:15	is um
0:14:16	it's more less the same
0:14:17	so you you can classify in this case to think that if i a lot better
0:14:21	so here's the decision boundary and to see that
0:14:25	a in that that have a case you really fail
0:14:27	to classify
0:14:28	a correct you here
0:14:32	okay so in summary
0:14:34	re
0:14:34	extend that's a popular csp method
0:14:38	to extract stationary features
0:14:41	a S P significantly increase the classification a if especially for subjects
0:14:47	we perform badly with
0:14:49	csp
0:14:50	and unlike other methods like invariant csp
0:14:53	we are completely data-driven
0:14:56	we do not require additional recordings or models of the expected changes
0:15:02	and we also showed that it was not presented in this paper that the combination of stationary features and
0:15:09	unsupervised adaptation can further improve classification performance
0:15:15	so i want to thank you for your attention
0:15:18	we have to and
0:15:37	um can you explain more details about um uh
0:15:41	dot function yeah
0:15:43	in in our town
0:15:47	you mean um
0:15:49	yeah so the function just one yeah
0:15:51	so this function F is the set but it's kind of a heuristic because it makes
0:15:55	you're metrics this difference metrics makes it
0:15:58	positive
0:15:59	definite
0:16:00	so it means it's flits
0:16:01	the sign of all the negative eigenvalue
0:16:04	and it's as i
0:16:06	why you want to do so because
0:16:07	um
0:16:09	we want to use some you what you want to sound of K
0:16:12	of possible value a positive value so you want to
0:16:15	of for example here you some of like
0:16:17	oh what okay of possible uh a positive deviations
0:16:22	and you kind of want to do the same here
0:16:25	so you make this met the difference metrics positive definite
0:16:28	and then we can show that this is an upper bound
0:16:30	on on the other quantity
0:16:32	so so here you did yeah on the operation dot um duh free to sign on the whole new eigen
0:16:39	brazil has you and the expanding this right
0:16:42	uh
0:16:43	so what are we with computers difference metric then we do a eigen decomposition uh_huh and then flipped uh uh
0:16:48	the sign of or negative eigenvalues
0:16:52	okay so you keep on the positive ones unpleasantly
0:16:55	yeah
0:16:55	okay
0:16:56	an exit that they're actually i
0:16:58	eigen vectors like the directions are kind of this
0:17:02	flipped
0:17:02	or like when you have a
0:17:04	eigenvector with a negative
0:17:06	eigenvalues and you few flip it
0:17:08	simply but you do not like
0:17:10	change a lot but you only flip it
0:17:11	because you are only interested in positive contributions
0:17:14	yeah yeah
0:17:15	okay
0:17:15	thing
0:17:20	oh uh while you're
0:17:23	you know i need a lead to the chunks
0:17:25	you know uh really all you have some
0:17:28	uh
0:17:29	no particle you can use clustering to find some similarities as well no you you you can you can simply
0:17:35	use
0:17:36	the channel size of one that means that you use
0:17:38	each trial
0:17:40	that each trial is enters the channel
0:17:42	you can do for example this we can do this uh try to wise
0:17:46	well you can put
0:17:47	the
0:17:48	trials from the same class which a subsequent
0:17:51	together in one chunk
0:17:52	so we do not apply any for clustering we only like put some together
0:17:57	overall we we do it for each trial separate
0:18:06	my question about your
0:18:09	yeah money consuming and that at different me
0:18:14	no this is was only one uh one one test
0:18:17	session
0:18:18	okay
0:18:23	uh the question what the clustering of the chunk sizes
0:18:26	so if you
0:18:27	if you use the chunk size which is not a than one would you could
0:18:30	the look
0:18:31	average old part of you know and stationarity
0:18:35	and yeah so this is what this was the idea to use chunk sizes because
0:18:39	with you use chunk size of one then you like detect
0:18:43	the changes on a small uh times K
0:18:46	if you take that
0:18:47	chunk sizes then
0:18:49	you time scale
0:18:50	we also be bigger because we average out the changes which only a curve for example in one trial
0:18:56	so we we tried different
0:18:57	chunk sizes and like select is the best one using cross-validation
0:19:06	oh

STATIONARY COMMON SPATIAL PATTERNS: TOWARDS ROBUST CLASSIFICATION OF NON-STATIONARY EEG SIGNALS

Biosignal Processing

Přednášející: Wojciech Wojcikiewicz, Autoři: Wojciech Wojcikiewicz, Carmen Vidaurre, Technical University of Berlin, Germany; Motoaki Kawanabe, Fraunhofer Institute FIRST, Germany