0:00:13and and there a one um of
0:00:15rough urgent phones at university from calm
0:00:18five for a or two presents a joint one with my C for vices it can
0:00:23uh the topic is a analysis synthesis
0:00:26based speech enhancement
0:00:28we is improved
0:00:29spectral envelope estimation by tracking speech time then
0:00:34so
0:00:34uh
0:00:35first less
0:00:36have a look at our line
0:00:37for for my presentation
0:00:39first
0:00:40at the very beginning i where we introduce some uh
0:00:43but runs
0:00:44uh
0:00:44as a spectral you all some
0:00:47a a effect by noise corruption
0:00:50conventional filtering ring
0:00:52and now introduce a model based based speech enhancement
0:00:55uh
0:00:56which is a previous
0:00:57proposed by us
0:00:59and uh i work then introduce a speech tracking
0:01:02speech dynamics tracking scheme that is used
0:01:06in conjunction with the model based
0:01:08speech enhancement
0:01:10and uh uh performance evaluation
0:01:13a cushion you
0:01:16so uh
0:01:17let's to have a a first have a look
0:01:19the effect of noise corruption from as a true
0:01:22perspective
0:01:24yeah
0:01:24use a white noise for example
0:01:27we can observe that the
0:01:29harmonic structure of speech as the C V a lead image
0:01:33and the
0:01:35the the special name a lot is now
0:01:38which are out in a lot of a spectral distortion
0:01:41and the
0:01:43we are
0:01:45the is some uh mention no statistical model based
0:01:48as speech and and has meant to
0:01:50and
0:01:52the the the
0:01:53the upper figure shows the classical oh lot special an impact you
0:01:58though
0:01:59and the
0:02:00from the job times spent on we can see that
0:02:03the lower portion of the
0:02:05special and have been
0:02:06restored
0:02:08but the overall noise level
0:02:11can not be
0:02:12um
0:02:13where was suppressed
0:02:14so as a result there will be
0:02:17many is music tones and the wrist reese residual noise is in the
0:02:22clean
0:02:23a a process the speech
0:02:25and the the
0:02:26lower or figure shows that uh optimum own modify the
0:02:30log spectrum them to do you
0:02:32and the
0:02:33these not the generally
0:02:35have a very good
0:02:37at it
0:02:37a cat ability of office
0:02:38noise suppression
0:02:40and but however
0:02:42the form men and of the harmonics but
0:02:44structures
0:02:45um
0:02:46for that just talk
0:02:48so um that a of of us also um often you
0:02:53are better pass goal but the
0:02:56and the wild the
0:02:58uh lower you go
0:02:59gives a
0:03:00better segment low snr school so there is always a tradeoff
0:03:05the two in the noise suppression
0:03:06and the the harmonic
0:03:08distortion
0:03:09we can say the naturalness of speech
0:03:14so uh
0:03:15can be also observed
0:03:17from the spend joe
0:03:18special special model that
0:03:20no voice will first
0:03:22a C V are they just thought of this
0:03:24special name model
0:03:26and the can measure no statistical method
0:03:29what the
0:03:30partial you're restore the
0:03:32the
0:03:33spectrum am all but a partial of for that just the spectrum
0:03:37so this what potentially a con for some
0:03:40comment and features
0:03:41as such as music tones and the low intelligibility problems
0:03:45in um
0:03:46speech enhancement
0:03:49so you in our our previous work we have proposed um
0:03:53analysis synthesis this approach
0:03:56based on the how most model
0:03:59so that
0:03:59basic idea is to
0:04:01it's track a close to
0:04:03Q
0:04:03from noisy spatial
0:04:06and the down we reconstruct the noise uh the type a speech
0:04:10by re is this
0:04:12using these speech only information
0:04:14so you can see had from yeah
0:04:16you have a speech information so you can have the track the location of the harmonics
0:04:22you have a actual again
0:04:24so you can have that all are average spectral
0:04:27and at level
0:04:28and you have the special envelope
0:04:30you can have the
0:04:31track
0:04:32uh
0:04:33many to respect
0:04:36so why use this
0:04:37what we choose this approach to
0:04:39uh
0:04:40to do speech enhancement
0:04:42first
0:04:43this model was cape
0:04:45escape
0:04:45bow to generate
0:04:47clean harmonics
0:04:49and that only speech related information is size
0:04:52so i background noise is out to me be removed
0:04:56and the this
0:04:58this model also
0:05:00and retrieved
0:05:02some then each harmonic structure
0:05:05and that as moves
0:05:07spectrum would hope so so no isolates spectrum peaks
0:05:10and the hands no meats
0:05:12we from one problem
0:05:15and also this mortal allows
0:05:17at independent adjustment of
0:05:20different more apparent
0:05:22so it in a thing and both ask to
0:05:25for was or N has the spent M role
0:05:28and
0:05:29using this framework
0:05:31so by you think this now thought
0:05:33at
0:05:33we can
0:05:35you uh we can suffer from the noise suppression
0:05:38and the the harmonic distortion trade
0:05:43so from some
0:05:44uh of of our previous work
0:05:47um
0:05:48after we uh
0:05:49applying some
0:05:50for clean procedures
0:05:51using conventional method that
0:05:54we can apply the pitch
0:05:56uh frequency domain pitch searching
0:05:58and that that
0:05:59a a spectral again estimation
0:06:03some um really
0:06:04preliminary result
0:06:06shows that that P H and the spectral gain estimation
0:06:09already already give very
0:06:10good performance
0:06:12by a a a a a pine on the perfect in the spectrum
0:06:17however
0:06:18the spectrum envelope estimation is
0:06:21someone and
0:06:23ad
0:06:24so
0:06:27uh
0:06:27we can see yeah for some are really made a result
0:06:31shows that the the past goal for
0:06:34uh uh there a do you want noise
0:06:37would give a already one point five
0:06:39and the some um
0:06:42but can measure an approach what a run
0:06:44one point nine
0:06:46and the our previous
0:06:48approach
0:06:49take D vol
0:06:51you with this
0:06:52pretty clean and can give a uh
0:06:56also a a a point to
0:06:58uh improvement
0:06:59however
0:07:00it's we replace
0:07:02the M brought with a to clean rule
0:07:05this
0:07:06that
0:07:06it can achieve
0:07:08three point one seven
0:07:10so it is
0:07:11huge huge got here
0:07:12so we we would expect some
0:07:16improvement in past call if we can
0:07:19further proof
0:07:20spectrum them
0:07:24so that problem can be state
0:07:26as a
0:07:27so for each frame use
0:07:29frames of noisy observation
0:07:32uh we want to find a mapping
0:07:34between the noise and train spectral envelopes
0:07:38and of full can set sec two frames
0:07:40we want to find that
0:07:42temporary tried to juries of clean special neville
0:07:46oh
0:07:46uh i in other words we want to estimate clean speech and by
0:07:51looking for long term
0:07:53speech you pollution
0:07:57so by as you me uh over us
0:08:00certain pure at time
0:08:02uh a the S U yeah relationship between the consecutive clean spectrum blobs
0:08:07and uh a
0:08:08the in relationship between the noise and clean
0:08:11special on them
0:08:12we can use that lenient an
0:08:14just the model to more though
0:08:16this
0:08:17uh
0:08:18state chucking
0:08:20so the
0:08:21the feature
0:08:22we used here is uh
0:08:24a a line spectrum frequency of lpc coefficients
0:08:29and uh
0:08:31and the
0:08:33for
0:08:33each uh pure
0:08:35see each cu result
0:08:37all
0:08:37observations
0:08:39so
0:08:39we have well
0:08:41as C a series of lpc coefficients
0:08:45so a given a comments system few uh
0:08:47part meters
0:08:49we can run it
0:08:50um um i and the
0:08:52yeah
0:08:53oh ten clean L quite vision
0:08:57so the next proper or what you how to to ten
0:09:01uh
0:09:02a a common system permit us
0:09:04for
0:09:05each
0:09:06the year is all
0:09:07which
0:09:09so the idea is that for each block of noisy observations
0:09:14we find a a a a we
0:09:15we use the for each and the culpable
0:09:19that
0:09:20but also the
0:09:21class did
0:09:22parallel i lpc coefficients
0:09:24and the
0:09:27to through some uh optimize region
0:09:30quite your we can all to and the corresponding i meant them permit
0:09:36so in that all fine chaining just we have all
0:09:40noisy and noisy and clean
0:09:43uh L C coefficients
0:09:45and the
0:09:47we use those
0:09:48spread B Q
0:09:50to um
0:09:52sure a to and uh
0:09:54global and trace
0:09:56in the sense that blocks with similar be sure
0:09:59a a group into the same class us
0:10:02by saying a similar we need to do define a distortion measure here
0:10:07it could be is uh something vol
0:10:10measure as a a you could in or you can
0:10:13use the
0:10:14some contract manager as as a
0:10:17uh
0:10:18as a uh
0:10:20modified i S measure
0:10:22and the
0:10:23you also a to define i'm
0:10:26feature for each
0:10:27prop of all persuasions
0:10:29you can use that average just special
0:10:31or you can use all of theories of
0:10:35vectors
0:10:36so it it what it actually be a a matrix quantisation quantization you this case
0:10:42and the
0:10:44a for each cluster
0:10:46we have both noisy and clean up so
0:10:48uh
0:10:49observation a noisy and clean
0:10:52features
0:10:53so we can minimize the total neck
0:10:56a like cool function
0:10:58for each cluster
0:10:59and we will
0:11:01oh to and the design the
0:11:03comment system them permit in this case
0:11:07so you know i like adaptation up they just we
0:11:10we
0:11:11we also have a a noisy observations
0:11:14for a block
0:11:16and the we use the
0:11:18say
0:11:19at this
0:11:20this that's measure to find the cop and trees
0:11:23and that has the
0:11:25corresponding comments just the parameters
0:11:28and the were run their common are we
0:11:31is uh as that's of permit us and we will get the design better on them
0:11:37so you can
0:11:38so from the
0:11:40spectral round yeah that the tracking
0:11:42actually gives very good
0:11:45uh
0:11:46performance
0:11:48also have from uh
0:11:51three D view that
0:11:52the
0:11:54a noisy
0:11:55the noise the envelope trying to juries a
0:11:58quite
0:11:59mad and the flat rate
0:12:01and the some get
0:12:03this conventional mention P of a read what the re risk oh
0:12:07some harmonics
0:12:08but a resulting some use one problem when
0:12:11but
0:12:12that the this most tracking
0:12:14subject
0:12:15use here
0:12:16which give various moves
0:12:17and uh
0:12:19uh a and accurate to to re
0:12:25so it is it can also be
0:12:28observe from this figure that the
0:12:31for
0:12:32is that a
0:12:33spend it
0:12:34the phone then with
0:12:35expend as compared to the conventional map
0:12:38so the tracking gives very close
0:12:41to the original spectral envelope
0:12:44try to the right
0:12:47so uh there's still time men do spectrum
0:12:50and that harmonic structures
0:12:52uh also
0:12:54to
0:12:55and the from the fine find or size
0:12:58speech we can see that uh
0:13:01no
0:13:02smell or
0:13:04um it and no use homes
0:13:06and the
0:13:07harmonic structures i
0:13:09retrieved
0:13:10and the
0:13:13actually we can achieve a run
0:13:17to phone
0:13:18for one
0:13:19pass school for
0:13:22speaker dependent trendy
0:13:24and the the
0:13:25uh
0:13:27the noise we use it is a from there are to ten db
0:13:31or
0:13:32uh uh using a white noise car noise and uh uh
0:13:37a a be noise
0:13:39so a speaker dependent and this be in the pen and testing is used
0:13:47and it finally uh
0:13:49i can group the presentation yeah
0:13:51in this paper
0:13:52presentation
0:13:53we uh we've block at the effect of noise corporation an cry option
0:13:58and the conventional speech enhancement
0:14:01it's got
0:14:02as been just got
0:14:03and not and not it seems this approach is present
0:14:07and and speech dynamic tracking important that incorporate
0:14:11you change in the common ring as proposed
0:14:14and they prove
0:14:16a special name estimation is illustrated
0:14:19objective to in terms
0:14:21but
0:14:22spectral distortion and passed call i show
0:14:25so
0:14:26at so for my
0:14:27edition
0:14:27i think you
0:14:28yeah that so much
0:14:33yeah this the first question
0:14:42you have be audio samples are and then have you can up to uh bring with me so you're
0:14:47yeah yeah yeah i was then some
0:14:50good
0:14:50it
0:14:51yeah i you can sort or could you come on C P U cost to issues
0:14:56oh um
0:14:57actually i you
0:14:59use that a a a for training
0:15:01it will be time consuming a
0:15:04will you out
0:15:05but you can can show that all the in your protection
0:15:08in of the uh
0:15:10a a a i thought of size
0:15:13so that that's always a tradeoff
0:15:15okay fine tune is full
0:15:18set able
0:15:20let
0:15:21that's quite lot
0:15:25yeah a next question please
0:15:27from your presentation uh i realise that
0:15:30the on is by send is according to clean signal be or upper bound right
0:15:35yeah
0:15:36the on nice is sentences results according to the given to clean and but of the be you upper bound
0:15:44of the optimal case would be the time to lead to show that clean yeah why is so my question
0:15:50is is
0:15:51you had in is said the on effect of D
0:15:53they a noisy phase information that you using your sentences
0:15:58so um what will be
0:16:00the exact a of the uh noisy envelope
0:16:03and the noisy
0:16:05and fate information that you are using a would is in this case
0:16:09in this work we just use the
0:16:11many do spectral
0:16:12we have not uh look at the face ms
0:16:15actually uh
0:16:16in college it has enhancement
0:16:19uh a face not selling four
0:16:21improved by
0:16:22um
0:16:23research
0:16:24could
0:16:25a fact the intelligibility
0:16:27so
0:16:27uh uh maybe in free for sure works we will come
0:16:30but for your information the were some papers also
0:16:34talking about the importance of phase information in in
0:16:37the T
0:16:39a made as which are work you know
0:16:41yeah you this is a
0:16:43have a some i mean that a gap between the upper bound and the proposed method of your can also
0:16:48be because a
0:16:50that's a
0:16:50noise it face
0:16:51so this check this scheme is uh
0:16:54this lee
0:16:55what well for voiced speech so
0:16:57for um voiced speech we
0:16:59we can just use some pretty clean the data
0:17:02so this would be something weird asian
0:17:05for
0:17:06for form
0:17:06a gap between the optimal
0:17:09proposed but
0:17:12i would be interested to know what you need to
0:17:15a voice activity detector
0:17:17all
0:17:18actually we have tried to use the void
0:17:21you D to trend voiced and i'm voice
0:17:24for different that
0:17:26but there is out there that
0:17:28we can sure that
0:17:30trend that
0:17:31one class to pull or data
0:17:34it's
0:17:35you better for
0:17:36for the whole tracking
0:17:37yeah you synthesis model is very
0:17:40yeah adequate for
0:17:41it's a sinusoidal model for approach using yeah voiced sounds how do you put use the unvoiced sounds
0:17:48so the unvoiced voiced sound it is basically uh
0:17:51no P and uh we just
0:17:52used that
0:17:54um uh
0:17:55a a a a boy
0:17:57time the women port two
0:17:58seems size
0:17:59to have a gain information and P information
0:18:02and just
0:18:03commit time domain and
0:18:06yeah i you are they have for of the questions
0:18:09that is not the case
0:18:10thank you once more