0:00:13 | and and there a one um of |
---|
0:00:15 | rough urgent phones at university from calm |
---|
0:00:18 | five for a or two presents a joint one with my C for vices it can |
---|
0:00:23 | uh the topic is a analysis synthesis |
---|
0:00:26 | based speech enhancement |
---|
0:00:28 | we is improved |
---|
0:00:29 | spectral envelope estimation by tracking speech time then |
---|
0:00:34 | so |
---|
0:00:34 | uh |
---|
0:00:35 | first less |
---|
0:00:36 | have a look at our line |
---|
0:00:37 | for for my presentation |
---|
0:00:39 | first |
---|
0:00:40 | at the very beginning i where we introduce some uh |
---|
0:00:43 | but runs |
---|
0:00:44 | uh |
---|
0:00:44 | as a spectral you all some |
---|
0:00:47 | a a effect by noise corruption |
---|
0:00:50 | conventional filtering ring |
---|
0:00:52 | and now introduce a model based based speech enhancement |
---|
0:00:55 | uh |
---|
0:00:56 | which is a previous |
---|
0:00:57 | proposed by us |
---|
0:00:59 | and uh i work then introduce a speech tracking |
---|
0:01:02 | speech dynamics tracking scheme that is used |
---|
0:01:06 | in conjunction with the model based |
---|
0:01:08 | speech enhancement |
---|
0:01:10 | and uh uh performance evaluation |
---|
0:01:13 | a cushion you |
---|
0:01:16 | so uh |
---|
0:01:17 | let's to have a a first have a look |
---|
0:01:19 | the effect of noise corruption from as a true |
---|
0:01:22 | perspective |
---|
0:01:24 | yeah |
---|
0:01:24 | use a white noise for example |
---|
0:01:27 | we can observe that the |
---|
0:01:29 | harmonic structure of speech as the C V a lead image |
---|
0:01:33 | and the |
---|
0:01:35 | the the special name a lot is now |
---|
0:01:38 | which are out in a lot of a spectral distortion |
---|
0:01:41 | and the |
---|
0:01:43 | we are |
---|
0:01:45 | the is some uh mention no statistical model based |
---|
0:01:48 | as speech and and has meant to |
---|
0:01:50 | and |
---|
0:01:52 | the the the |
---|
0:01:53 | the upper figure shows the classical oh lot special an impact you |
---|
0:01:58 | though |
---|
0:01:59 | and the |
---|
0:02:00 | from the job times spent on we can see that |
---|
0:02:03 | the lower portion of the |
---|
0:02:05 | special and have been |
---|
0:02:06 | restored |
---|
0:02:08 | but the overall noise level |
---|
0:02:11 | can not be |
---|
0:02:12 | um |
---|
0:02:13 | where was suppressed |
---|
0:02:14 | so as a result there will be |
---|
0:02:17 | many is music tones and the wrist reese residual noise is in the |
---|
0:02:22 | clean |
---|
0:02:23 | a a process the speech |
---|
0:02:25 | and the the |
---|
0:02:26 | lower or figure shows that uh optimum own modify the |
---|
0:02:30 | log spectrum them to do you |
---|
0:02:32 | and the |
---|
0:02:33 | these not the generally |
---|
0:02:35 | have a very good |
---|
0:02:37 | at it |
---|
0:02:37 | a cat ability of office |
---|
0:02:38 | noise suppression |
---|
0:02:40 | and but however |
---|
0:02:42 | the form men and of the harmonics but |
---|
0:02:44 | structures |
---|
0:02:45 | um |
---|
0:02:46 | for that just talk |
---|
0:02:48 | so um that a of of us also um often you |
---|
0:02:53 | are better pass goal but the |
---|
0:02:56 | and the wild the |
---|
0:02:58 | uh lower you go |
---|
0:02:59 | gives a |
---|
0:03:00 | better segment low snr school so there is always a tradeoff |
---|
0:03:05 | the two in the noise suppression |
---|
0:03:06 | and the the harmonic |
---|
0:03:08 | distortion |
---|
0:03:09 | we can say the naturalness of speech |
---|
0:03:14 | so uh |
---|
0:03:15 | can be also observed |
---|
0:03:17 | from the spend joe |
---|
0:03:18 | special special model that |
---|
0:03:20 | no voice will first |
---|
0:03:22 | a C V are they just thought of this |
---|
0:03:24 | special name model |
---|
0:03:26 | and the can measure no statistical method |
---|
0:03:29 | what the |
---|
0:03:30 | partial you're restore the |
---|
0:03:32 | the |
---|
0:03:33 | spectrum am all but a partial of for that just the spectrum |
---|
0:03:37 | so this what potentially a con for some |
---|
0:03:40 | comment and features |
---|
0:03:41 | as such as music tones and the low intelligibility problems |
---|
0:03:45 | in um |
---|
0:03:46 | speech enhancement |
---|
0:03:49 | so you in our our previous work we have proposed um |
---|
0:03:53 | analysis synthesis this approach |
---|
0:03:56 | based on the how most model |
---|
0:03:59 | so that |
---|
0:03:59 | basic idea is to |
---|
0:04:01 | it's track a close to |
---|
0:04:03 | Q |
---|
0:04:03 | from noisy spatial |
---|
0:04:06 | and the down we reconstruct the noise uh the type a speech |
---|
0:04:10 | by re is this |
---|
0:04:12 | using these speech only information |
---|
0:04:14 | so you can see had from yeah |
---|
0:04:16 | you have a speech information so you can have the track the location of the harmonics |
---|
0:04:22 | you have a actual again |
---|
0:04:24 | so you can have that all are average spectral |
---|
0:04:27 | and at level |
---|
0:04:28 | and you have the special envelope |
---|
0:04:30 | you can have the |
---|
0:04:31 | track |
---|
0:04:32 | uh |
---|
0:04:33 | many to respect |
---|
0:04:36 | so why use this |
---|
0:04:37 | what we choose this approach to |
---|
0:04:39 | uh |
---|
0:04:40 | to do speech enhancement |
---|
0:04:42 | first |
---|
0:04:43 | this model was cape |
---|
0:04:45 | escape |
---|
0:04:45 | bow to generate |
---|
0:04:47 | clean harmonics |
---|
0:04:49 | and that only speech related information is size |
---|
0:04:52 | so i background noise is out to me be removed |
---|
0:04:56 | and the this |
---|
0:04:58 | this model also |
---|
0:05:00 | and retrieved |
---|
0:05:02 | some then each harmonic structure |
---|
0:05:05 | and that as moves |
---|
0:05:07 | spectrum would hope so so no isolates spectrum peaks |
---|
0:05:10 | and the hands no meats |
---|
0:05:12 | we from one problem |
---|
0:05:15 | and also this mortal allows |
---|
0:05:17 | at independent adjustment of |
---|
0:05:20 | different more apparent |
---|
0:05:22 | so it in a thing and both ask to |
---|
0:05:25 | for was or N has the spent M role |
---|
0:05:28 | and |
---|
0:05:29 | using this framework |
---|
0:05:31 | so by you think this now thought |
---|
0:05:33 | at |
---|
0:05:33 | we can |
---|
0:05:35 | you uh we can suffer from the noise suppression |
---|
0:05:38 | and the the harmonic distortion trade |
---|
0:05:43 | so from some |
---|
0:05:44 | uh of of our previous work |
---|
0:05:47 | um |
---|
0:05:48 | after we uh |
---|
0:05:49 | applying some |
---|
0:05:50 | for clean procedures |
---|
0:05:51 | using conventional method that |
---|
0:05:54 | we can apply the pitch |
---|
0:05:56 | uh frequency domain pitch searching |
---|
0:05:58 | and that that |
---|
0:05:59 | a a spectral again estimation |
---|
0:06:03 | some um really |
---|
0:06:04 | preliminary result |
---|
0:06:06 | shows that that P H and the spectral gain estimation |
---|
0:06:09 | already already give very |
---|
0:06:10 | good performance |
---|
0:06:12 | by a a a a a pine on the perfect in the spectrum |
---|
0:06:17 | however |
---|
0:06:18 | the spectrum envelope estimation is |
---|
0:06:21 | someone and |
---|
0:06:23 | ad |
---|
0:06:24 | so |
---|
0:06:27 | uh |
---|
0:06:27 | we can see yeah for some are really made a result |
---|
0:06:31 | shows that the the past goal for |
---|
0:06:34 | uh uh there a do you want noise |
---|
0:06:37 | would give a already one point five |
---|
0:06:39 | and the some um |
---|
0:06:42 | but can measure an approach what a run |
---|
0:06:44 | one point nine |
---|
0:06:46 | and the our previous |
---|
0:06:48 | approach |
---|
0:06:49 | take D vol |
---|
0:06:51 | you with this |
---|
0:06:52 | pretty clean and can give a uh |
---|
0:06:56 | also a a a point to |
---|
0:06:58 | uh improvement |
---|
0:06:59 | however |
---|
0:07:00 | it's we replace |
---|
0:07:02 | the M brought with a to clean rule |
---|
0:07:05 | this |
---|
0:07:06 | that |
---|
0:07:06 | it can achieve |
---|
0:07:08 | three point one seven |
---|
0:07:10 | so it is |
---|
0:07:11 | huge huge got here |
---|
0:07:12 | so we we would expect some |
---|
0:07:16 | improvement in past call if we can |
---|
0:07:19 | further proof |
---|
0:07:20 | spectrum them |
---|
0:07:24 | so that problem can be state |
---|
0:07:26 | as a |
---|
0:07:27 | so for each frame use |
---|
0:07:29 | frames of noisy observation |
---|
0:07:32 | uh we want to find a mapping |
---|
0:07:34 | between the noise and train spectral envelopes |
---|
0:07:38 | and of full can set sec two frames |
---|
0:07:40 | we want to find that |
---|
0:07:42 | temporary tried to juries of clean special neville |
---|
0:07:46 | oh |
---|
0:07:46 | uh i in other words we want to estimate clean speech and by |
---|
0:07:51 | looking for long term |
---|
0:07:53 | speech you pollution |
---|
0:07:57 | so by as you me uh over us |
---|
0:08:00 | certain pure at time |
---|
0:08:02 | uh a the S U yeah relationship between the consecutive clean spectrum blobs |
---|
0:08:07 | and uh a |
---|
0:08:08 | the in relationship between the noise and clean |
---|
0:08:11 | special on them |
---|
0:08:12 | we can use that lenient an |
---|
0:08:14 | just the model to more though |
---|
0:08:16 | this |
---|
0:08:17 | uh |
---|
0:08:18 | state chucking |
---|
0:08:20 | so the |
---|
0:08:21 | the feature |
---|
0:08:22 | we used here is uh |
---|
0:08:24 | a a line spectrum frequency of lpc coefficients |
---|
0:08:29 | and uh |
---|
0:08:31 | and the |
---|
0:08:33 | for |
---|
0:08:33 | each uh pure |
---|
0:08:35 | see each cu result |
---|
0:08:37 | all |
---|
0:08:37 | observations |
---|
0:08:39 | so |
---|
0:08:39 | we have well |
---|
0:08:41 | as C a series of lpc coefficients |
---|
0:08:45 | so a given a comments system few uh |
---|
0:08:47 | part meters |
---|
0:08:49 | we can run it |
---|
0:08:50 | um um i and the |
---|
0:08:52 | yeah |
---|
0:08:53 | oh ten clean L quite vision |
---|
0:08:57 | so the next proper or what you how to to ten |
---|
0:09:01 | uh |
---|
0:09:02 | a a common system permit us |
---|
0:09:04 | for |
---|
0:09:05 | each |
---|
0:09:06 | the year is all |
---|
0:09:07 | which |
---|
0:09:09 | so the idea is that for each block of noisy observations |
---|
0:09:14 | we find a a a a we |
---|
0:09:15 | we use the for each and the culpable |
---|
0:09:19 | that |
---|
0:09:20 | but also the |
---|
0:09:21 | class did |
---|
0:09:22 | parallel i lpc coefficients |
---|
0:09:24 | and the |
---|
0:09:27 | to through some uh optimize region |
---|
0:09:30 | quite your we can all to and the corresponding i meant them permit |
---|
0:09:36 | so in that all fine chaining just we have all |
---|
0:09:40 | noisy and noisy and clean |
---|
0:09:43 | uh L C coefficients |
---|
0:09:45 | and the |
---|
0:09:47 | we use those |
---|
0:09:48 | spread B Q |
---|
0:09:50 | to um |
---|
0:09:52 | sure a to and uh |
---|
0:09:54 | global and trace |
---|
0:09:56 | in the sense that blocks with similar be sure |
---|
0:09:59 | a a group into the same class us |
---|
0:10:02 | by saying a similar we need to do define a distortion measure here |
---|
0:10:07 | it could be is uh something vol |
---|
0:10:10 | measure as a a you could in or you can |
---|
0:10:13 | use the |
---|
0:10:14 | some contract manager as as a |
---|
0:10:17 | uh |
---|
0:10:18 | as a uh |
---|
0:10:20 | modified i S measure |
---|
0:10:22 | and the |
---|
0:10:23 | you also a to define i'm |
---|
0:10:26 | feature for each |
---|
0:10:27 | prop of all persuasions |
---|
0:10:29 | you can use that average just special |
---|
0:10:31 | or you can use all of theories of |
---|
0:10:35 | vectors |
---|
0:10:36 | so it it what it actually be a a matrix quantisation quantization you this case |
---|
0:10:42 | and the |
---|
0:10:44 | a for each cluster |
---|
0:10:46 | we have both noisy and clean up so |
---|
0:10:48 | uh |
---|
0:10:49 | observation a noisy and clean |
---|
0:10:52 | features |
---|
0:10:53 | so we can minimize the total neck |
---|
0:10:56 | a like cool function |
---|
0:10:58 | for each cluster |
---|
0:10:59 | and we will |
---|
0:11:01 | oh to and the design the |
---|
0:11:03 | comment system them permit in this case |
---|
0:11:07 | so you know i like adaptation up they just we |
---|
0:11:10 | we |
---|
0:11:11 | we also have a a noisy observations |
---|
0:11:14 | for a block |
---|
0:11:16 | and the we use the |
---|
0:11:18 | say |
---|
0:11:19 | at this |
---|
0:11:20 | this that's measure to find the cop and trees |
---|
0:11:23 | and that has the |
---|
0:11:25 | corresponding comments just the parameters |
---|
0:11:28 | and the were run their common are we |
---|
0:11:31 | is uh as that's of permit us and we will get the design better on them |
---|
0:11:37 | so you can |
---|
0:11:38 | so from the |
---|
0:11:40 | spectral round yeah that the tracking |
---|
0:11:42 | actually gives very good |
---|
0:11:45 | uh |
---|
0:11:46 | performance |
---|
0:11:48 | also have from uh |
---|
0:11:51 | three D view that |
---|
0:11:52 | the |
---|
0:11:54 | a noisy |
---|
0:11:55 | the noise the envelope trying to juries a |
---|
0:11:58 | quite |
---|
0:11:59 | mad and the flat rate |
---|
0:12:01 | and the some get |
---|
0:12:03 | this conventional mention P of a read what the re risk oh |
---|
0:12:07 | some harmonics |
---|
0:12:08 | but a resulting some use one problem when |
---|
0:12:11 | but |
---|
0:12:12 | that the this most tracking |
---|
0:12:14 | subject |
---|
0:12:15 | use here |
---|
0:12:16 | which give various moves |
---|
0:12:17 | and uh |
---|
0:12:19 | uh a and accurate to to re |
---|
0:12:25 | so it is it can also be |
---|
0:12:28 | observe from this figure that the |
---|
0:12:31 | for |
---|
0:12:32 | is that a |
---|
0:12:33 | spend it |
---|
0:12:34 | the phone then with |
---|
0:12:35 | expend as compared to the conventional map |
---|
0:12:38 | so the tracking gives very close |
---|
0:12:41 | to the original spectral envelope |
---|
0:12:44 | try to the right |
---|
0:12:47 | so uh there's still time men do spectrum |
---|
0:12:50 | and that harmonic structures |
---|
0:12:52 | uh also |
---|
0:12:54 | to |
---|
0:12:55 | and the from the fine find or size |
---|
0:12:58 | speech we can see that uh |
---|
0:13:01 | no |
---|
0:13:02 | smell or |
---|
0:13:04 | um it and no use homes |
---|
0:13:06 | and the |
---|
0:13:07 | harmonic structures i |
---|
0:13:09 | retrieved |
---|
0:13:10 | and the |
---|
0:13:13 | actually we can achieve a run |
---|
0:13:17 | to phone |
---|
0:13:18 | for one |
---|
0:13:19 | pass school for |
---|
0:13:22 | speaker dependent trendy |
---|
0:13:24 | and the the |
---|
0:13:25 | uh |
---|
0:13:27 | the noise we use it is a from there are to ten db |
---|
0:13:31 | or |
---|
0:13:32 | uh uh using a white noise car noise and uh uh |
---|
0:13:37 | a a be noise |
---|
0:13:39 | so a speaker dependent and this be in the pen and testing is used |
---|
0:13:47 | and it finally uh |
---|
0:13:49 | i can group the presentation yeah |
---|
0:13:51 | in this paper |
---|
0:13:52 | presentation |
---|
0:13:53 | we uh we've block at the effect of noise corporation an cry option |
---|
0:13:58 | and the conventional speech enhancement |
---|
0:14:01 | it's got |
---|
0:14:02 | as been just got |
---|
0:14:03 | and not and not it seems this approach is present |
---|
0:14:07 | and and speech dynamic tracking important that incorporate |
---|
0:14:11 | you change in the common ring as proposed |
---|
0:14:14 | and they prove |
---|
0:14:16 | a special name estimation is illustrated |
---|
0:14:19 | objective to in terms |
---|
0:14:21 | but |
---|
0:14:22 | spectral distortion and passed call i show |
---|
0:14:25 | so |
---|
0:14:26 | at so for my |
---|
0:14:27 | edition |
---|
0:14:27 | i think you |
---|
0:14:28 | yeah that so much |
---|
0:14:33 | yeah this the first question |
---|
0:14:42 | you have be audio samples are and then have you can up to uh bring with me so you're |
---|
0:14:47 | yeah yeah yeah i was then some |
---|
0:14:50 | good |
---|
0:14:50 | it |
---|
0:14:51 | yeah i you can sort or could you come on C P U cost to issues |
---|
0:14:56 | oh um |
---|
0:14:57 | actually i you |
---|
0:14:59 | use that a a a for training |
---|
0:15:01 | it will be time consuming a |
---|
0:15:04 | will you out |
---|
0:15:05 | but you can can show that all the in your protection |
---|
0:15:08 | in of the uh |
---|
0:15:10 | a a a i thought of size |
---|
0:15:13 | so that that's always a tradeoff |
---|
0:15:15 | okay fine tune is full |
---|
0:15:18 | set able |
---|
0:15:20 | let |
---|
0:15:21 | that's quite lot |
---|
0:15:25 | yeah a next question please |
---|
0:15:27 | from your presentation uh i realise that |
---|
0:15:30 | the on is by send is according to clean signal be or upper bound right |
---|
0:15:35 | yeah |
---|
0:15:36 | the on nice is sentences results according to the given to clean and but of the be you upper bound |
---|
0:15:44 | of the optimal case would be the time to lead to show that clean yeah why is so my question |
---|
0:15:50 | is is |
---|
0:15:51 | you had in is said the on effect of D |
---|
0:15:53 | they a noisy phase information that you using your sentences |
---|
0:15:58 | so um what will be |
---|
0:16:00 | the exact a of the uh noisy envelope |
---|
0:16:03 | and the noisy |
---|
0:16:05 | and fate information that you are using a would is in this case |
---|
0:16:09 | in this work we just use the |
---|
0:16:11 | many do spectral |
---|
0:16:12 | we have not uh look at the face ms |
---|
0:16:15 | actually uh |
---|
0:16:16 | in college it has enhancement |
---|
0:16:19 | uh a face not selling four |
---|
0:16:21 | improved by |
---|
0:16:22 | um |
---|
0:16:23 | research |
---|
0:16:24 | could |
---|
0:16:25 | a fact the intelligibility |
---|
0:16:27 | so |
---|
0:16:27 | uh uh maybe in free for sure works we will come |
---|
0:16:30 | but for your information the were some papers also |
---|
0:16:34 | talking about the importance of phase information in in |
---|
0:16:37 | the T |
---|
0:16:39 | a made as which are work you know |
---|
0:16:41 | yeah you this is a |
---|
0:16:43 | have a some i mean that a gap between the upper bound and the proposed method of your can also |
---|
0:16:48 | be because a |
---|
0:16:50 | that's a |
---|
0:16:50 | noise it face |
---|
0:16:51 | so this check this scheme is uh |
---|
0:16:54 | this lee |
---|
0:16:55 | what well for voiced speech so |
---|
0:16:57 | for um voiced speech we |
---|
0:16:59 | we can just use some pretty clean the data |
---|
0:17:02 | so this would be something weird asian |
---|
0:17:05 | for |
---|
0:17:06 | for form |
---|
0:17:06 | a gap between the optimal |
---|
0:17:09 | proposed but |
---|
0:17:12 | i would be interested to know what you need to |
---|
0:17:15 | a voice activity detector |
---|
0:17:17 | all |
---|
0:17:18 | actually we have tried to use the void |
---|
0:17:21 | you D to trend voiced and i'm voice |
---|
0:17:24 | for different that |
---|
0:17:26 | but there is out there that |
---|
0:17:28 | we can sure that |
---|
0:17:30 | trend that |
---|
0:17:31 | one class to pull or data |
---|
0:17:34 | it's |
---|
0:17:35 | you better for |
---|
0:17:36 | for the whole tracking |
---|
0:17:37 | yeah you synthesis model is very |
---|
0:17:40 | yeah adequate for |
---|
0:17:41 | it's a sinusoidal model for approach using yeah voiced sounds how do you put use the unvoiced sounds |
---|
0:17:48 | so the unvoiced voiced sound it is basically uh |
---|
0:17:51 | no P and uh we just |
---|
0:17:52 | used that |
---|
0:17:54 | um uh |
---|
0:17:55 | a a a a boy |
---|
0:17:57 | time the women port two |
---|
0:17:58 | seems size |
---|
0:17:59 | to have a gain information and P information |
---|
0:18:02 | and just |
---|
0:18:03 | commit time domain and |
---|
0:18:06 | yeah i you are they have for of the questions |
---|
0:18:09 | that is not the case |
---|
0:18:10 | thank you once more |
---|