0:00:15 | a |
---|
0:00:17 | hello do every i'm one of the you |
---|
0:00:19 | uh i right to P G uh |
---|
0:00:21 | our paper court co we the "'em" to fine and and ripple of from uh india yeah random |
---|
0:00:27 | a fast |
---|
0:00:29 | a tight and acoustically motivated base apply a four hundred a mean do were and source separation |
---|
0:00:36 | and is is sporting uh we have looks the first to present a send which is mainly for on uh |
---|
0:00:43 | as all the source spectrum or like uh and M maps or a S M at as we could before |
---|
0:00:48 | so i like to have a size that uh oh work here on the for read close on a |
---|
0:00:54 | the space a model |
---|
0:00:56 | that is some more at to source space of position |
---|
0:01:01 | so here's a are i have to present patient uh first i right to uh people E a rats so |
---|
0:01:06 | proper and a follow by you is general or a gaussian modeling framework uh for source separation |
---|
0:01:12 | then be moved to the main contribution of the work that he's by uh designing a new acoustically motivated space |
---|
0:01:19 | of prior |
---|
0:01:20 | and uh design or a maximum of to be up a be to estimation that to you hand the source |
---|
0:01:26 | separation performance |
---|
0:01:28 | and finally i so some uh experimental results and conclusion |
---|
0:01:33 | okay uh |
---|
0:01:35 | here we are considering zero source separation a problem where we use a a i uh month each and signal |
---|
0:01:42 | you know to by i se T |
---|
0:01:44 | two separate so all |
---|
0:01:46 | S up say |
---|
0:01:47 | and where are a a as the number of sensor is some more attention most all C is it's a |
---|
0:01:52 | under mean case |
---|
0:01:54 | K and it structure |
---|
0:01:57 | and if |
---|
0:01:57 | creating node |
---|
0:01:58 | by us is is is a contribution of only source S they to the microphone array so she's a is |
---|
0:02:05 | called source image she's |
---|
0:02:07 | oh which is related to the origin is all is by a mixing process sees |
---|
0:02:11 | characterised by them we see feature uh it's straight |
---|
0:02:15 | is that a more drilling uh the acoustic the process is from the source to the microphone |
---|
0:02:21 | and in the call type i do you "'em" since is in which are is them because of several sources |
---|
0:02:25 | sees |
---|
0:02:26 | so we have ice tea |
---|
0:02:27 | is this sum |
---|
0:02:28 | of is i |
---|
0:02:30 | okay that's a missing more |
---|
0:02:33 | so uh most uh state of the ask a process uh four hundred in mean as source separation operates in |
---|
0:02:39 | the frequency domain |
---|
0:02:40 | where a as the convolution in the time domain is up it it by the complex value month view case |
---|
0:02:47 | in needs the you me which is a simple form |
---|
0:02:51 | and and as a so the on this plastic secure some and uh where are only a few scenes |
---|
0:02:56 | uh i was assumed to be active at |
---|
0:02:58 | i frequency point |
---|
0:03:00 | for used no value yeah pop you uh uh do a and we assume been in close to to step |
---|
0:03:06 | uh of uh uh uh the estimates and we if of a uh is here i actually |
---|
0:03:11 | and then |
---|
0:03:12 | is just a square use in uh |
---|
0:03:14 | and state or is i |
---|
0:03:16 | is still we have a a by at for used in a binary mask |
---|
0:03:20 | where only one source is he's see that to be active like its time-frequency point |
---|
0:03:25 | so but this is taken it green be main you need to you know really stick the people over an |
---|
0:03:30 | environment |
---|
0:03:31 | as since the narrowband approximation here here than a how |
---|
0:03:36 | so you our work |
---|
0:03:37 | we uh a you go to different frame |
---|
0:03:41 | where where uh as a sock comes with just one coefficient of the source in these these |
---|
0:03:46 | is more as a zero-mean of gaussian random variable |
---|
0:03:49 | so a a is more a as the gaussian with a zero mean |
---|
0:03:54 | and covariance man sees a signal actually |
---|
0:03:57 | and we further fight the rise stick my as i |
---|
0:04:00 | by to to high a bit to V a N as a |
---|
0:04:03 | and V a a is the scalar sauce that yeah we encode suspect show how of the sources |
---|
0:04:10 | so that is for more just tossed that spec chili from set |
---|
0:04:14 | and actually |
---|
0:04:16 | is the spatial covariance matrix because these |
---|
0:04:18 | we in |
---|
0:04:19 | is space to a used an of the source |
---|
0:04:22 | okay and we are focusing more on the morning of the uh actually |
---|
0:04:30 | so uh as cool state of asks uh you lying on the net of approach to mason uh wind results |
---|
0:04:37 | on the wrong one and then is so as a |
---|
0:04:39 | is still products of to two we see that the is |
---|
0:04:43 | but in our world |
---|
0:04:44 | uh we yeah proposed the for right matt she's for as a way as a coefficient of actually |
---|
0:04:51 | and not deterministic lead elated |
---|
0:04:54 | okay so is no such fall rises |
---|
0:04:58 | so given an uh low and modeling framework and the parameterization as the source separation architecture we need to for |
---|
0:05:05 | step uh so we need these people are |
---|
0:05:07 | a for as to handle me signal is me into frequency domain |
---|
0:05:11 | and then the and the model me till here is the sauce value and and space of query matches |
---|
0:05:17 | and then uh as as a source coefficient is to be cap by uh we of in the way kind |
---|
0:05:23 | of soft masking and then every construct a time-domain signal |
---|
0:05:28 | so we have a "'cause" uh from now on a uh you on the estimation of a more to to |
---|
0:05:33 | we select a yahoo |
---|
0:05:35 | defined it here |
---|
0:05:38 | okay and uh |
---|
0:05:40 | here here is a P jen the main contribution of the paper is score of |
---|
0:05:44 | acoustically motivated this space apply prior |
---|
0:05:46 | so we have to see the reason the sort of and in some situations an |
---|
0:05:50 | where are the view T set can be no |
---|
0:05:53 | just secure S and can come a for this than in the past |
---|
0:05:57 | for a where as the police in of the right is fixed |
---|
0:06:00 | or in the form meeting whereas as a push is in of this |
---|
0:06:03 | uh do later use fixed |
---|
0:06:05 | for used in or on the broadcast thing where we know exactly |
---|
0:06:09 | the put to denote the salt sees and the room acoustic |
---|
0:06:13 | so given Z is known you make says think uh we can exploit is an all these about the sauce |
---|
0:06:19 | score is and and two character |
---|
0:06:22 | to in hand the source separation performance |
---|
0:06:25 | that's the motivation for the work |
---|
0:06:28 | and here we see oh one he's an all is for material |
---|
0:06:31 | whom acoustic |
---|
0:06:33 | so |
---|
0:06:34 | if you assume that uh a as the D test pass and are we were in a a a and |
---|
0:06:38 | correlate that |
---|
0:06:39 | and the event a is fused |
---|
0:06:42 | is means that as the how can come form more old pushed in these a two |
---|
0:06:46 | so uh is |
---|
0:06:48 | uh that you we uh win uh leonard no is the mean of the space of or very in is |
---|
0:06:53 | we need close the contribution |
---|
0:06:56 | of of that's part |
---|
0:06:57 | which is defined it here and the covariance up to a T was and a |
---|
0:07:01 | and all these parameter |
---|
0:07:03 | a a it's just a a and C can be computed directly |
---|
0:07:06 | even to you you setting |
---|
0:07:08 | so uh for the next at time i we not present a at a how we can be computed but |
---|
0:07:13 | you can be for to the paper |
---|
0:07:15 | so uh okay uh |
---|
0:07:17 | that's a again so given the room with the the the |
---|
0:07:22 | a Q Q missus setting uh we can compute dean's up the space of corbin and bases |
---|
0:07:27 | and even as is uh mean oh we D five i as the inverse process prior over uh the space |
---|
0:07:34 | the is |
---|
0:07:35 | so |
---|
0:07:35 | as a follows the inverse process distribution |
---|
0:07:39 | with the mean |
---|
0:07:41 | given by here and be computed from form the to really of statistical room acoustic |
---|
0:07:46 | and is a value in which is going to by uh the parameter at |
---|
0:07:51 | it's called a degree of freedom |
---|
0:07:53 | can be learned from the training data in the maximum like lisa was sent |
---|
0:07:57 | okay i mean not represent a in about the learning process |
---|
0:08:01 | the reason we choose in speech that's here is that it's a could you could you case prior to the |
---|
0:08:06 | them a gaussian people |
---|
0:08:08 | so we been to as in in a close form a the later on |
---|
0:08:14 | okay so uh |
---|
0:08:16 | now i'll i'll oh is to estimate the as the pen to me to C time |
---|
0:08:21 | and uh we use the expectation maximization yeah and we them a for is proposed |
---|
0:08:28 | where |
---|
0:08:28 | is step |
---|
0:08:29 | uh we estimate uh the empirical covariance of bits of cheese |
---|
0:08:33 | uh a man has to to here |
---|
0:08:36 | uh by Z C question where uh that you we still owe simply a window if the we a multichannel |
---|
0:08:41 | wiener of in ring |
---|
0:08:43 | and in the and step uh uh that is you know a that for the map at don't be to |
---|
0:08:48 | up this that we start things |
---|
0:08:50 | so you were see of these a and and say uh can be it a T V updates |
---|
0:08:55 | in is uh jens that |
---|
0:08:58 | and if you see L C question up C separate you can uh uh see that uh |
---|
0:09:03 | he the contribution of the likelihood |
---|
0:09:05 | and Z power come from the contribution of the prior |
---|
0:09:09 | uh that we have it |
---|
0:09:10 | and gamma is the |
---|
0:09:13 | a chair up on a bit error we J D to means the contribution of the pilot |
---|
0:09:17 | and if you want to a bit uh to the me to in the maximum likelihood sense C be step |
---|
0:09:22 | uh a guy is zero |
---|
0:09:24 | so we can come |
---|
0:09:26 | to that like to said |
---|
0:09:29 | okay and now uh we have everything in hand us and uh |
---|
0:09:33 | that's size so some experiment with a |
---|
0:09:37 | so we we compare the source separation performance up propose uh |
---|
0:09:43 | use the paper using uh |
---|
0:09:45 | uh the map of how to meet estimation we there uh |
---|
0:09:49 | uh the maximum likelihood and with them the to likelihood mites re |
---|
0:09:54 | we had the first one is that a uh we don't know every any C uh the you |
---|
0:09:58 | a a so as a a a a blindly the initial i |
---|
0:10:02 | and the second one is that the uh as a is in is a light from the same you made |
---|
0:10:07 | see setting |
---|
0:10:08 | so we a fair comparison |
---|
0:10:11 | we still that if we know some uh uh are you mess stepping before here |
---|
0:10:15 | uh we can improve the source |
---|
0:10:18 | and B so compare as source separation with the base i uh binary mask |
---|
0:10:22 | rather than be some few is fixed |
---|
0:10:24 | the fourth i in the to |
---|
0:10:27 | that |
---|
0:10:28 | but see that is computed that of uh from that you see set thing |
---|
0:10:32 | a a so the formula before |
---|
0:10:35 | and here a some up how a need to die |
---|
0:10:38 | speech and sampling rate number of yeah the works and |
---|
0:10:41 | yeah |
---|
0:10:43 | and he is a find a reason as uh is is the every three as uh |
---|
0:10:47 | in terms of signal to distortion ratio we them as of the overall distortion and |
---|
0:10:53 | and uh and uh |
---|
0:10:55 | we compare this separation results the over at feast or on which are a four sources |
---|
0:11:00 | uh with you where you here |
---|
0:11:02 | and microphone spacing things five something meter |
---|
0:11:05 | and uh we uh |
---|
0:11:07 | compute separation results with D for an uh a reverberation time ranging from um |
---|
0:11:12 | a very here uh and that weights so fifty millisecond very uh people about "'em" and five hundred |
---|
0:11:20 | and i |
---|
0:11:20 | use that |
---|
0:11:21 | rule i |
---|
0:11:22 | he's the results given by our for pos the and we uh where the prior information |
---|
0:11:28 | you |
---|
0:11:29 | and you we can see that uh |
---|
0:11:31 | uh of the proposed at with them out form or or or a maximum likelihood at with them and baseline |
---|
0:11:37 | approach |
---|
0:11:38 | a in all uh people over and |
---|
0:11:40 | a have thing |
---|
0:11:43 | okay for instance uh |
---|
0:11:44 | guess that will uh |
---|
0:11:46 | sam |
---|
0:11:58 | all |
---|
0:11:59 | okay |
---|
0:12:07 | okay or maybe this |
---|
0:12:08 | is that in |
---|
0:12:21 | okay alright right gig can uh |
---|
0:12:23 | V |
---|
0:12:23 | so uh |
---|
0:12:24 | you at see that are for sample at uh the revision in time up but two and a few T |
---|
0:12:29 | is a a moderate use in time |
---|
0:12:30 | oh proposed and with them where we know some up iron or is about |
---|
0:12:35 | set the and uh in a hand the stuff that separation form by one |
---|
0:12:39 | that's yeah |
---|
0:12:40 | go back to uh an ad at which and |
---|
0:12:44 | okay he's |
---|
0:12:45 | whose and |
---|
0:12:46 | a uh in the uh our work we propose an acoustically motivated this space of Y are uh |
---|
0:12:52 | which is |
---|
0:12:53 | a from that you rio |
---|
0:12:55 | is that the seek a room acoustic |
---|
0:12:57 | and we derive for the maximum of post the right be a a a a at with uh week so |
---|
0:13:01 | of uh presuppose to to the estimation of the more apparent be to |
---|
0:13:06 | and and the permutation problem okay |
---|
0:13:09 | a i like to every size this one because even known you made testing |
---|
0:13:13 | with the map and with them uh we do not of for from the well-known known with a simple them |
---|
0:13:18 | in the frequency domain source separation |
---|
0:13:21 | and importantly we so with that to prove but was |
---|
0:13:24 | with the help of a |
---|
0:13:26 | yes |
---|
0:13:27 | but uh a at this point uh we still need to know a many how to meet error like the |
---|
0:13:33 | source sports is and and the re in time a uh a to compute a a a the mean of |
---|
0:13:38 | the space of a very much as |
---|
0:13:40 | so that use your work can be D put good the |
---|
0:13:42 | to a fully a an source separation by estimate the or the acoustic |
---|
0:13:47 | yeah |
---|
0:13:50 | okay that's and of my yeah and they said thank you |
---|
0:14:00 | we have time for |
---|
0:14:01 | so question |
---|
0:14:13 | Q for the presentation my name's is of some of the in T D |
---|
0:14:17 | on the a ha how do cut it does |
---|
0:14:19 | speech are right are in the I |
---|
0:14:21 | right |
---|
0:14:23 | in in the one yeah |
---|
0:14:24 | speech of right yeah how do you |
---|
0:14:26 | cricket |
---|
0:14:28 | uh |
---|
0:14:29 | okay |
---|
0:14:31 | so he's a space of fire and so uh |
---|
0:14:34 | the distribution is even so |
---|
0:14:37 | what we need |
---|
0:14:38 | to know is the mean |
---|
0:14:40 | and uh uh the variance |
---|
0:14:42 | he's the if i did i i |
---|
0:14:44 | so for the mean see time |
---|
0:14:47 | uh we can compute directly from the you miss yet thing |
---|
0:14:51 | for example you've we know the distant from the source to the microphone |
---|
0:14:55 | we can compute the forth uh that it's a from the sound of a microphone |
---|
0:15:00 | and uh so it uh we use to that the even at |
---|
0:15:03 | if few |
---|
0:15:04 | so uh |
---|
0:15:05 | that the yep that's |
---|
0:15:06 | school i see all politically state the main are a |
---|
0:15:09 | i i i i and that's um |
---|
0:15:11 | so my question needs |
---|
0:15:12 | if the the the uh |
---|
0:15:14 | right yeah yeah |
---|
0:15:15 | different from there |
---|
0:15:17 | really you like |
---|
0:15:18 | the the you |
---|
0:15:19 | uh |
---|
0:15:20 | different like so you could you oh well |
---|
0:15:23 | a distance is before and |
---|
0:15:25 | a how low a lot of things P |
---|
0:15:27 | to you are loaded |
---|
0:15:29 | yeah to be and is uh i i have been in to get |
---|
0:15:32 | and so got C but that yeah it's a very good so |
---|
0:15:36 | for future investigation |
---|
0:15:38 | yeah |
---|
0:15:39 | and i actually at at this is uh that's uh in this still a where we uh tried to prove |
---|
0:15:44 | that a even as some known you missus set thing again improves the principal it's a separate simple performance |
---|
0:15:51 | but we us |
---|
0:15:52 | that's sky of was source that's said |
---|
0:15:54 | or or as the based in if each you like to estimate these parameters |
---|
0:15:59 | a bentley a from the mixture |
---|
0:16:01 | so at the time do not have a |
---|
0:16:03 | yeah such a a a a a uh variation |
---|
0:16:06 | you |
---|
0:16:24 | okay uh yes okay firstly uh which is a very well known uh do you that we might to be |
---|
0:16:31 | mask P a we can see that is eat to as zero baseline a part |
---|
0:16:35 | and i actually in our previous work that with the same uh |
---|
0:16:39 | a |
---|
0:16:40 | with the same at more frame will uh like the sees and with them with maximum neck |
---|
0:16:45 | a presented in how a previous paper we also compared to perform and |
---|
0:16:50 | scum at a state of the that we said |
---|
0:16:52 | some be that the size |
---|
0:16:53 | and |
---|
0:16:55 | yeah |
---|
0:16:55 | a using a would be nice more and it's is both but was uh approach outperformed performed sees at with |
---|
0:17:01 | which is already compared |
---|
0:17:03 | some as a of |
---|
0:17:05 | i i i would not say one at a state of the but |
---|
0:17:07 | this |
---|
0:17:08 | some of and |
---|
0:17:10 | as the baseline |
---|
0:17:17 | that's the questions |
---|
0:17:25 | then that thank the speaker |
---|