0:00:13 | and you for an introduction |
---|
0:00:15 | so |
---|
0:00:16 | that's is that right away um the out of or |
---|
0:00:19 | um i'm got make a short introduction |
---|
0:00:22 | i giving a problem statement |
---|
0:00:23 | um and then wanna |
---|
0:00:25 | so introduce the uh |
---|
0:00:27 | the speech distortion weighted multichannel wiener filter |
---|
0:00:30 | and then |
---|
0:00:32 | but introduced |
---|
0:00:33 | we also very short in that condition of speech present but |
---|
0:00:36 | which um |
---|
0:00:38 | is the basis for the to solution that we gonna propose |
---|
0:00:41 | and find "'em" one hope to the word |
---|
0:00:43 | just to give a |
---|
0:00:44 | shock the on the |
---|
0:00:46 | hearing loss problem |
---|
0:00:48 | so some common cost of here well ways and B |
---|
0:00:51 | H related or |
---|
0:00:52 | exposed to to noise and |
---|
0:00:54 | or or of listening to loud music for |
---|
0:00:57 | a long time here |
---|
0:00:58 | so these a |
---|
0:00:59 | and a fact that can |
---|
0:01:00 | a fact all of us |
---|
0:01:02 | but more or the a consequence |
---|
0:01:04 | if you have a hearing loss or |
---|
0:01:06 | is uh |
---|
0:01:08 | you have a reduce the frequency resolution solution temporal resolution so |
---|
0:01:12 | you have difficulty distinguish between |
---|
0:01:15 | different sounds |
---|
0:01:16 | a a different frequency |
---|
0:01:18 | was a have |
---|
0:01:19 | problems with a low class sounds |
---|
0:01:21 | and it's problem of course um |
---|
0:01:23 | one or |
---|
0:01:24 | when |
---|
0:01:25 | hearing aid uses is |
---|
0:01:27 | is in a |
---|
0:01:28 | noise environment |
---|
0:01:29 | possibly with multiple speakers or any kind of noise |
---|
0:01:33 | and also |
---|
0:01:34 | a problem can be |
---|
0:01:35 | reverberation |
---|
0:01:37 | so for this |
---|
0:01:38 | reason there |
---|
0:01:39 | in the past and |
---|
0:01:40 | many more to microphone or structure proposed |
---|
0:01:42 | uh a is as directional microphones |
---|
0:01:45 | various of but beam formers |
---|
0:01:47 | how it is work would a for was on the multichannel wiener few |
---|
0:01:51 | so |
---|
0:01:52 | basically a the idea of all approach to |
---|
0:01:54 | find a set of filter coefficient |
---|
0:01:57 | so that |
---|
0:01:59 | you can do do a reduce the noise and minimize the speech distortion |
---|
0:02:03 | and the old goal of course is two |
---|
0:02:05 | improve the um |
---|
0:02:06 | intelligibility |
---|
0:02:08 | so |
---|
0:02:09 | if it does that by the defining the uh mike of all signals so you have a |
---|
0:02:13 | speech signal |
---|
0:02:14 | and uh |
---|
0:02:15 | additive noise contribution where |
---|
0:02:18 | is the frequency index and |
---|
0:02:19 | it is the frame index |
---|
0:02:21 | in this case we will more than the uh two microphone set |
---|
0:02:25 | so the and and yeah um |
---|
0:02:28 | it miss a criterion we form an like this so we wanna find a set of |
---|
0:02:31 | that the coefficient that minimize |
---|
0:02:33 | the difference between the |
---|
0:02:35 | decide speech component |
---|
0:02:37 | the filter to |
---|
0:02:38 | version of the |
---|
0:02:40 | noisy signals |
---|
0:02:42 | so basically we choose to estimate the |
---|
0:02:44 | the speech complained the first microphone so that would be the front microphone the hearing |
---|
0:02:49 | so |
---|
0:02:50 | an extension of this is if we sure that the speech and noise |
---|
0:02:53 | uh are statistically independent |
---|
0:02:55 | we can formulate a |
---|
0:02:57 | the M secrets here in this way so the first term corresponds to |
---|
0:03:00 | a speech distortion term |
---|
0:03:02 | and the second term corresponds to the |
---|
0:03:04 | the she'd don't noise |
---|
0:03:07 | and |
---|
0:03:08 | then the formation |
---|
0:03:10 | can be like like this |
---|
0:03:11 | so basically you have the estimated |
---|
0:03:13 | speech correlation matrix and the |
---|
0:03:15 | is the noise only correlation matrix |
---|
0:03:18 | weighted by a certain factor |
---|
0:03:19 | which correspond to um |
---|
0:03:21 | almost was a fact |
---|
0:03:23 | so you at this point we can |
---|
0:03:24 | see that the end of we have basically |
---|
0:03:28 | is based on the correlation matrix |
---|
0:03:31 | so we show a course of details and |
---|
0:03:32 | the problems involved in |
---|
0:03:34 | is to make these |
---|
0:03:35 | contribution |
---|
0:03:37 | so |
---|
0:03:38 | in general to estimate the uh |
---|
0:03:41 | the basic is to estimate the noise |
---|
0:03:43 | only correlation lectures |
---|
0:03:45 | and this speech plus noise |
---|
0:03:47 | correlation majors |
---|
0:03:48 | so they're the |
---|
0:03:50 | a speech |
---|
0:03:51 | so user |
---|
0:03:52 | basically to get a clean speech production major |
---|
0:03:54 | a can do that by for instance |
---|
0:03:56 | using a a voice activity detector to estimate the |
---|
0:04:00 | P that T speech correlation images |
---|
0:04:02 | doing a a speech plus noise pair |
---|
0:04:04 | and noise only doing is only few it's and then you make this |
---|
0:04:08 | so structure here |
---|
0:04:10 | so basically a |
---|
0:04:11 | in the in we have um |
---|
0:04:13 | is contribution |
---|
0:04:14 | skip fixed doing |
---|
0:04:15 | different periods |
---|
0:04:17 | so of course but as if you have a a speech does not here |
---|
0:04:21 | the update of the noise only correlation may just would be kept fixed |
---|
0:04:24 | and the speech plus noise correlation majors will |
---|
0:04:27 | the update |
---|
0:04:28 | so of course level |
---|
0:04:29 | also so the limitation of the |
---|
0:04:32 | tracking of the |
---|
0:04:33 | noise correlation matches because |
---|
0:04:35 | you imagine and if |
---|
0:04:36 | but the noise |
---|
0:04:37 | prior to the speech appear |
---|
0:04:39 | higher then |
---|
0:04:41 | then the speech plus noise pier |
---|
0:04:42 | and if we stop adapting |
---|
0:04:45 | the noise pollution images |
---|
0:04:47 | we basically have a |
---|
0:04:48 | that's a a red |
---|
0:04:49 | or or more special ability |
---|
0:04:52 | furthermore the estimation of the correlation may just as |
---|
0:04:56 | is typically |
---|
0:04:57 | don with a high averaging a |
---|
0:04:59 | should really in the area of two to three seconds |
---|
0:05:01 | so somehow how this also limits the um |
---|
0:05:04 | tracking capability |
---|
0:05:05 | spectral |
---|
0:05:08 | so if you look at the motivation for work are we start that |
---|
0:05:11 | um um |
---|
0:05:12 | since the S D W and all we depends on the long term average |
---|
0:05:16 | uh |
---|
0:05:18 | basically the noise to do some kind of a limited |
---|
0:05:20 | kind of eliminate um |
---|
0:05:22 | start time if X us and musical noise and |
---|
0:05:25 | and all that at |
---|
0:05:26 | i fixed but this present a single channel noise reduction |
---|
0:05:29 | another issue that we got a would here is this |
---|
0:05:32 | a |
---|
0:05:32 | weighting factor here |
---|
0:05:34 | with a general is used as a fixed weighting factor |
---|
0:05:37 | for |
---|
0:05:38 | all frequency of all frames |
---|
0:05:40 | and this is what we kind of say well |
---|
0:05:42 | this |
---|
0:05:43 | the what a base of our work is to find a optimal weighting factor |
---|
0:05:47 | because |
---|
0:05:48 | in general you can say that the speech and noise |
---|
0:05:51 | will be a stationary and in general was a say that one the speaking will have a lot of silence |
---|
0:05:56 | here in to in that we can exploit |
---|
0:05:58 | in the noise option |
---|
0:06:00 | process |
---|
0:06:01 | why |
---|
0:06:03 | the noise |
---|
0:06:03 | general general could be |
---|
0:06:05 | continues press |
---|
0:06:06 | so what propose is that |
---|
0:06:09 | we want to apply a different weight to the |
---|
0:06:12 | speech dominant segments and to the noise |
---|
0:06:14 | them |
---|
0:06:15 | dominant segments |
---|
0:06:17 | to do that |
---|
0:06:18 | which of inspiration from uh |
---|
0:06:20 | a single channel much ducks approach where there |
---|
0:06:23 | but a lot of work been done on a |
---|
0:06:25 | spectral try |
---|
0:06:27 | so |
---|
0:06:28 | so basically we don't inspiration from |
---|
0:06:30 | a a of the speech present ability |
---|
0:06:33 | basically |
---|
0:06:34 | they there's that by finding that two state models |
---|
0:06:36 | so you have one one state what you have |
---|
0:06:39 | noise only and then have |
---|
0:06:40 | once we go speech plus noise |
---|
0:06:42 | where as the use standard approach basing that assume that we have |
---|
0:06:46 | noise given that all time |
---|
0:06:48 | so by |
---|
0:06:49 | exploiting a to state model |
---|
0:06:52 | who we can improve the noise option |
---|
0:06:55 | so basically just a very shortly introduced to speech possible bill T |
---|
0:06:58 | it's estimate for each frequency for each frame |
---|
0:07:01 | it is based on uh |
---|
0:07:04 | an estimate of the |
---|
0:07:05 | the probability of |
---|
0:07:07 | speech being absent |
---|
0:07:08 | and then you have very contribution of |
---|
0:07:10 | different |
---|
0:07:11 | see to noise ratio measures |
---|
0:07:13 | so an example can be shown here |
---|
0:07:15 | where are you can see here that |
---|
0:07:18 | so low frequency area yeah |
---|
0:07:20 | high probability of speech and then |
---|
0:07:22 | a certain point you have a lot or build |
---|
0:07:24 | so the question was |
---|
0:07:25 | how can be |
---|
0:07:26 | exploit this in a |
---|
0:07:28 | in the most channel wiener feel |
---|
0:07:31 | we we start by kind of what to find the uh objective function so |
---|
0:07:36 | have we first have a first term |
---|
0:07:38 | we is the H one state where the the P |
---|
0:07:40 | and we have a second term |
---|
0:07:42 | which is the H zero state weighted by the |
---|
0:07:44 | one minus P so basically |
---|
0:07:46 | we take into account that we also have a |
---|
0:07:49 | a whether |
---|
0:07:50 | noise only so we |
---|
0:07:51 | can be |
---|
0:07:52 | more aggressive this stays in terms of noise reduction |
---|
0:07:56 | where we derive it of course the |
---|
0:07:58 | now we have |
---|
0:07:59 | you end up with a term |
---|
0:08:01 | one O P |
---|
0:08:02 | which basically |
---|
0:08:03 | um |
---|
0:08:04 | kind of like a um |
---|
0:08:06 | is not change for each |
---|
0:08:07 | frequency for each frame of that's with a fixed weighting factor B |
---|
0:08:11 | so basically if you have a high probability of speech |
---|
0:08:13 | you go back to kind of like preserving the speech and |
---|
0:08:16 | if you have a low probability |
---|
0:08:17 | you got |
---|
0:08:19 | to more aggressive noise reduction |
---|
0:08:20 | the problem here however is that |
---|
0:08:23 | as you so before |
---|
0:08:24 | the uh |
---|
0:08:26 | this speech present bob it's a kind of various a lot for each frequency of course when we applied in |
---|
0:08:30 | in this setup |
---|
0:08:32 | we we had a lot of distortion a lot of to face basically |
---|
0:08:36 | some aspects that was related to |
---|
0:08:39 | signal channel noise reduction |
---|
0:08:40 | a fact is that |
---|
0:08:43 | this filter here doesn't really distinguish between the |
---|
0:08:45 | it show the H one state |
---|
0:08:48 | so we |
---|
0:08:48 | when a little further |
---|
0:08:50 | i mean look and we kind of like that |
---|
0:08:52 | what have as if we could actually |
---|
0:08:54 | to take the H where H one state |
---|
0:08:56 | so we had was so we propose a simple method to do this |
---|
0:08:59 | we already have |
---|
0:09:00 | the information |
---|
0:09:02 | per frequency |
---|
0:09:03 | so we kind of just set okay we look at for each |
---|
0:09:06 | each frame we to be average |
---|
0:09:08 | and if the average is higher and a than a certain |
---|
0:09:11 | that stress how |
---|
0:09:14 | we were we were selected as |
---|
0:09:15 | H one state and |
---|
0:09:16 | otherwise i eight zero |
---|
0:09:18 | here's an example of this is a clean speech signal but of course it was estimated on the |
---|
0:09:23 | noise signal |
---|
0:09:24 | and here you can see that are certain |
---|
0:09:27 | so do values here we we be |
---|
0:09:29 | did take S H one state and all the S it's zero state |
---|
0:09:33 | so the rational behind having this |
---|
0:09:35 | information is that |
---|
0:09:37 | in the H |
---|
0:09:38 | zero state |
---|
0:09:39 | the noise corruption perform form there can be |
---|
0:09:41 | wait differently because that's no speech presence of B can be |
---|
0:09:44 | must must rested without |
---|
0:09:46 | compromising the |
---|
0:09:48 | this |
---|
0:09:48 | or increase the speech distortion |
---|
0:09:51 | in the H one state of course |
---|
0:09:52 | we |
---|
0:09:53 | we also want to reduce some most but we want to do it a bit more carefully |
---|
0:09:57 | so this is the idea of what we wanna apply a certain |
---|
0:09:59 | flexible weighting |
---|
0:10:02 | to do that a similar way |
---|
0:10:04 | what you can see here is that |
---|
0:10:05 | if we have detected a |
---|
0:10:07 | H one state we apply much small higher stress L |
---|
0:10:10 | a weighting factor |
---|
0:10:12 | and |
---|
0:10:13 | if it's a H one state |
---|
0:10:14 | at some point |
---|
0:10:15 | we were still apply a a lower but |
---|
0:10:17 | fixed weighting factor |
---|
0:10:19 | and it went if a bit to gets higher a kind of weighted |
---|
0:10:22 | according |
---|
0:10:23 | in that way |
---|
0:10:24 | in |
---|
0:10:25 | you can kind of preserve certain speech Q |
---|
0:10:29 | so to build that into the uh |
---|
0:10:32 | the standard and double there |
---|
0:10:34 | so basically we have a combination of soft values and a binary detection |
---|
0:10:39 | so the first one |
---|
0:10:40 | is uh a function of |
---|
0:10:42 | H one state |
---|
0:10:43 | which is a function of |
---|
0:10:45 | certain fixed trestle |
---|
0:10:47 | and the speech present ability |
---|
0:10:49 | and the second term is basically |
---|
0:10:51 | kind of using a fixed weighting fight |
---|
0:10:54 | and we derive it is |
---|
0:10:55 | a of course it all |
---|
0:10:57 | a P here this is the |
---|
0:10:58 | weighting factor |
---|
0:10:59 | so |
---|
0:11:00 | by exploiting both the |
---|
0:11:02 | soft value and the hardware |
---|
0:11:06 | and then we is honest |
---|
0:11:07 | uh simulation as well uh |
---|
0:11:09 | use the to microphone hearing the idea |
---|
0:11:12 | in a one all set up |
---|
0:11:14 | a |
---|
0:11:15 | and we have a relatively low level and time |
---|
0:11:18 | to more to babble noise sources |
---|
0:11:22 | and we used to objective quality measures uh |
---|
0:11:24 | uh which is the |
---|
0:11:26 | it's it's is an hour and |
---|
0:11:29 | the signal distortion |
---|
0:11:32 | so |
---|
0:11:32 | if we look at the results |
---|
0:11:34 | it to see that the standard method gives a much or |
---|
0:11:38 | signal to noise ratio |
---|
0:11:39 | but when you're re what when we decrease the weighting factor |
---|
0:11:42 | at the same time that E |
---|
0:11:43 | the distortion or also increases |
---|
0:11:46 | where we use the the one but we initially use with the one or what peter |
---|
0:11:50 | the problem was the high situation |
---|
0:11:52 | so |
---|
0:11:53 | you was still get like quite a good |
---|
0:11:55 | um |
---|
0:11:56 | is in uh performance but the distortion simply when very high |
---|
0:12:00 | but with the flexible press hall |
---|
0:12:03 | we use the |
---|
0:12:05 | different way fighter here we can see that the distortion like uh the um |
---|
0:12:09 | see does not stream |
---|
0:12:10 | improvement |
---|
0:12:11 | when is relatively high |
---|
0:12:12 | and the distortion was also have low |
---|
0:12:15 | of course the question is like how we you choose this weighting factor |
---|
0:12:18 | and that's of course still something that you're working on |
---|
0:12:23 | so |
---|
0:12:23 | does to summarise uh |
---|
0:12:25 | percent a different the extension of the uh |
---|
0:12:28 | is D W the we have algorithms |
---|
0:12:30 | we started to look at it with a fixed weighting factor |
---|
0:12:34 | then we incorporated the |
---|
0:12:35 | speech present T |
---|
0:12:37 | and then at the end we ended up with a combine solve |
---|
0:12:40 | and the binary detection |
---|
0:12:42 | in future work |
---|
0:12:43 | um |
---|
0:12:45 | we are aiming at performance some perceptual evaluation using a |
---|
0:12:49 | hearing it that listeners |
---|
0:12:50 | and |
---|
0:12:51 | we we'll we for the working on a |
---|
0:12:53 | finding a mall |
---|
0:12:55 | perceptually motivated weighting factor for is as we put |
---|
0:12:58 | uh exploits certain |
---|
0:13:00 | masking properties or |
---|
0:13:02 | even incorporating some |
---|
0:13:04 | a hearing models uh in the waiting process itself |
---|
0:13:08 | i do |
---|
0:13:11 | i |
---|
0:13:13 | i question |
---|
0:13:15 | i i yes please back the back |
---|
0:13:21 | Q for for each intention so my question is that the he C so a P you for |
---|
0:13:26 | uh uh speech do uh as |
---|
0:13:28 | to each is possible to apply to or twenty five each the wiener filtering for a speech an action |
---|
0:13:34 | for me just that you have to design speech and you have we include in speech |
---|
0:13:39 | so was a time not can you don't P and in can be a you know can she do we're |
---|
0:13:42 | still |
---|
0:13:43 | uh and these guys piece all seas |
---|
0:13:45 | so we yeah |
---|
0:13:47 | each that do you C C's D not P cable and how do you choose a weighting factor |
---|
0:13:51 | i i we use |
---|
0:13:53 | should you know one oh can you can you use some ninety |
---|
0:13:57 | i |
---|
0:13:57 | can you repeat the question go |
---|
0:13:59 | i so i can hear |
---|
0:14:01 | E yes okay yeah now you applies a multichannel channel mean if you mean for a noise reduction |
---|
0:14:07 | so my question is that E C's in a both a the speech production |
---|
0:14:12 | for symbol you have a desire to speech |
---|
0:14:15 | and you have will in turn few speech |
---|
0:14:19 | oh you are you mean like a multiple speakers in now yeah yeah yeah yeah |
---|
0:14:22 | well i guess it was still be uh |
---|
0:14:24 | i think you can it up i what's a scenario but but of course is gonna be more difficult |
---|
0:14:29 | estimating this a conditional speech possible of to because |
---|
0:14:33 | now the spectrum |
---|
0:14:34 | gonna be most most similar to the |
---|
0:14:37 | but decide speech signals of course |
---|
0:14:39 | no have to be much more careful when estimating the weighting fact |
---|
0:14:42 | and i think still that |
---|
0:14:43 | you it was to be |
---|
0:14:45 | you was it applied a multi |
---|
0:14:46 | speaker so that |
---|
0:14:47 | a build the results would be a little worse |
---|
0:14:50 | uh_huh |
---|
0:14:51 | okay thank you |
---|
0:14:52 | my question |
---|
0:14:54 | comments yes |
---|
0:14:58 | i mean and my questions a uh you reminded to he's question asking is uh |
---|
0:15:03 | uh when you apply i was them to |
---|
0:15:06 | uh to these uh |
---|
0:15:07 | do you have constraint on the east or or us and an something on the noise type |
---|
0:15:11 | right i |
---|
0:15:13 | because you the noise is an impulsive noise |
---|
0:15:16 | or |
---|
0:15:17 | and that type of noise my out you know |
---|
0:15:19 | as he set |
---|
0:15:20 | if for the noise is speech |
---|
0:15:22 | well in impulsive noise on the kind of noise um |
---|
0:15:25 | can used |
---|
0:15:26 | can this reasons do do you know with this see |
---|
0:15:29 | yeah well |
---|
0:15:30 | at this point we don't make any assumption of the noise actually |
---|
0:15:33 | a it can work one |
---|
0:15:34 | i i i was a that uh |
---|
0:15:36 | the most difficult scenario would be the motive |
---|
0:15:39 | a speaker in there but in terms of um |
---|
0:15:42 | noise types of thing you can apply to any of most |
---|
0:15:45 | there's no |
---|
0:15:46 | assumption so that we make a had to be |
---|
0:15:48 | certain type of noise |
---|
0:16:02 | a a user |
---|
0:16:05 | um |
---|
0:16:06 | so you mean that |
---|
0:16:08 | is |
---|
0:16:08 | this algorithm can be used for any type of uh |
---|
0:16:12 | uh |
---|
0:16:13 | noise |
---|
0:16:15 | or given the noise use uh and the speech just top |
---|
0:16:18 | inter speech |
---|
0:16:20 | yeah okay so |
---|
0:16:21 | well i |
---|
0:16:21 | i i think that uh |
---|
0:16:23 | in terms of choosing all these the values for threshold |
---|
0:16:26 | of course uh if you have like multiple speakers scenarios |
---|
0:16:30 | if you |
---|
0:16:31 | because |
---|
0:16:32 | well have depends on how well you can estimate all these uh a spectral components like that |
---|
0:16:36 | speech possible but |
---|
0:16:38 | and how how well you make the binary decision |
---|
0:16:40 | so of course if you have a multiple targets in out |
---|
0:16:44 | you might have a |
---|
0:16:45 | a large error on your estimation and then of course if you choose |
---|
0:16:49 | then probably you wanna choose a different value because |
---|
0:16:52 | if you have a large row in you will be |
---|
0:16:55 | subject to maybe |
---|
0:16:56 | a higher speech distortion what your five |
---|
0:16:58 | in this case if you have a that say |
---|
0:17:01 | read easy scenario like |
---|
0:17:03 | maybe like a car noise in that you have like more station noise |
---|
0:17:06 | then you estimation |
---|
0:17:07 | the speech by simple but probably most |
---|
0:17:10 | hi accuracy |
---|
0:17:11 | of course you can also apply most more aggressive |
---|
0:17:13 | press |
---|
0:17:14 | but if you have a was able talk as in there you probably have to be much more careful you |
---|
0:17:18 | can use them on there |
---|
0:17:19 | yeah |
---|
0:17:20 | oh |
---|
0:17:22 | i i mean i just one to ask have you to |
---|
0:17:24 | these type of scenario |
---|
0:17:26 | you you have to go any result |
---|
0:17:28 | you mean on the um you minimum a remote of all speakers scenario |
---|
0:17:32 | no will we didn't as the multiple speaker scenario of what we did it as the ways it was uh |
---|
0:17:36 | a much higher |
---|
0:17:37 | uh |
---|
0:17:38 | a room reverberation |
---|
0:17:39 | and then we saw that |
---|
0:17:41 | the estimation needed to be |
---|
0:17:42 | to a little bit |
---|
0:17:43 | and some of the values it is a carefully chosen but to |
---|
0:17:47 | increase the distortion |
---|
0:17:48 | in that case |
---|
0:17:49 | the estimation of the spectral components was much more in a |
---|
0:17:52 | so we kind of had to |
---|
0:17:55 | choose different values |
---|
0:17:57 | so of course |
---|
0:17:58 | it all depends on how a you can |
---|
0:18:00 | estimate he's |
---|
0:18:01 | components |
---|
0:18:02 | and |
---|
0:18:03 | and of here we just as a proof of concept we had like a low revisions an hour and just |
---|
0:18:07 | had a it's over |
---|
0:18:08 | babble |
---|
0:18:11 | my questions |
---|
0:18:12 | yeah a things that is not what i mean hearing it's then it's of course a of from you they |
---|
0:18:17 | can not only for speech right |
---|
0:18:18 | how does it found yeah because you use a different i state |
---|
0:18:22 | um depending on the frequency right and it depending on the frame |
---|
0:18:26 | yeah right so if yes and that for example then people want to you of the music might be |
---|
0:18:32 | yeah |
---|
0:18:32 | only second know how we will work in was used in there because this is more like a |
---|
0:18:36 | much should option process so i guess and use get a no |
---|
0:18:39 | if you this |
---|
0:18:40 | besides |
---|
0:18:41 | move |
---|
0:18:41 | so but then you should split it off many uh yeah music yeah probably ones with stop |
---|
0:18:47 | we only work with the |
---|
0:18:49 | speech signal yeah but if yeah had do you and it's of course it's applicable to any |
---|
0:18:53 | yeah of course i mean but of course in that terms them |
---|
0:18:56 | of recess doing more like a |
---|
0:18:58 | what a convex with between different the settings and so on that |
---|
0:19:02 | or in this case it |
---|
0:19:04 | it will not what well |
---|
0:19:06 | uh_huh |
---|
0:19:07 | i G and if you have used for speech then for example this start of the P to speech might |
---|
0:19:10 | not be detected well i because it's uh consider that known as noise |
---|
0:19:14 | yes to but but one example you can see is that sometimes like if you have like a high frequency |
---|
0:19:19 | component like he's |
---|
0:19:20 | some something yeah |
---|
0:19:22 | uh |
---|
0:19:23 | you are |
---|
0:19:24 | these because C but the colour by the noise |
---|
0:19:26 | i if the |
---|
0:19:27 | speech present probability built for in the case a very low probability speech and you are now it to be |
---|
0:19:31 | very grass |
---|
0:19:33 | these areas |
---|
0:19:34 | sometimes you really missed is |
---|
0:19:35 | yeah on the time so he's |
---|
0:19:37 | what a since it ways as was like |
---|
0:19:39 | he was saying like shoes |
---|
0:19:41 | i sometimes you |
---|
0:19:42 | will not be able to hit is actually |
---|
0:19:44 | if not allow |
---|
0:19:45 | a notion of to be very aggressive |
---|
0:19:47 | those |
---|
0:19:49 | yeah |
---|
0:19:50 | the of the techniques um a yeah and then uh well basically what we were can ours |
---|
0:19:55 | we know that we could be pretty aggressive |
---|
0:19:57 | but it would come at a cost |
---|
0:19:59 | so right now we are |
---|
0:20:00 | trying to kind of constrained these waiting like to by some |
---|
0:20:03 | psycho-acoustical problem |
---|
0:20:05 | so we exactly know when how when and how much to apply |
---|
0:20:10 | so basically if |
---|
0:20:11 | if we know that certain things see |
---|
0:20:13 | hi built of speech and |
---|
0:20:15 | then you probably mask or the noise of all the frequency |
---|
0:20:18 | and then we may not have to remove that most noise |
---|
0:20:21 | at the coming |
---|
0:20:22 | in the following week |
---|
0:20:24 | okay |
---|
0:20:25 | a comments questions |
---|
0:20:28 | okay thank you that's |
---|