0:00:13 | i |
---|
0:00:14 | um well actually a a a a it's uh |
---|
0:00:18 | one of my for P H Ds are one i for masters to |
---|
0:00:21 | uh |
---|
0:00:22 | use now working at a company and because of that he was name |
---|
0:00:25 | a travel here for |
---|
0:00:27 | for the presentation |
---|
0:00:28 | where |
---|
0:00:29 | course into |
---|
0:00:31 | but a a present a that the is focused on phoneme selective be speech enhancement for |
---|
0:00:36 | uh a generalized |
---|
0:00:37 | a parametric |
---|
0:00:38 | a spectral subtraction |
---|
0:00:40 | H |
---|
0:00:41 | so |
---|
0:00:42 | the approach there we're kind of looking at here |
---|
0:00:44 | uh it's to |
---|
0:00:46 | uh |
---|
0:00:47 | try to balance the differences between uh |
---|
0:00:50 | voice |
---|
0:00:51 | uh structures |
---|
0:00:52 | see C an articulatory domain |
---|
0:00:54 | uh noise will impact speech differently uh depending on the speech class |
---|
0:00:59 | i believe that |
---|
0:01:00 | uh adapting enhancement strategies is these different domains all actually |
---|
0:01:04 | prove your overall form |
---|
0:01:06 | um regions of low signal to noise ratio are we it gonna be more sensitive to |
---|
0:01:11 | uh a different types of noise |
---|
0:01:13 | babble or background |
---|
0:01:14 | type fluctuations |
---|
0:01:16 | um |
---|
0:01:17 | so it would make sense to try and track obviously see the signal to noise ratio |
---|
0:01:21 | but also look at that are with respect to |
---|
0:01:23 | uh the types of phone class |
---|
0:01:26 | a noise characteristics obviously a |
---|
0:01:28 | well and both quality intelligibility |
---|
0:01:30 | so that you approach here we like to kind of focus on a a phoneme class selective based |
---|
0:01:35 | a strategy |
---|
0:01:37 | uh that adapts |
---|
0:01:38 | sell |
---|
0:01:39 | have to phone classes over time |
---|
0:01:41 | so |
---|
0:01:43 | let's kind of maybe talk talk little bit about uh a different approach is people what to for phone class |
---|
0:01:48 | based |
---|
0:01:49 | uh enhancement |
---|
0:01:50 | um we you know obviously a noise is gonna impact of different from class |
---|
0:01:54 | as differently |
---|
0:01:55 | um |
---|
0:01:56 | and so based on the frequency content so articulatory structure an influence of noise and the phone as well as |
---|
0:02:02 | the |
---|
0:02:02 | uh a stationary noise you would expect |
---|
0:02:05 | uh quality |
---|
0:02:06 | packed differently |
---|
0:02:07 | so |
---|
0:02:08 | going back to uh transactions paper from a call in well has a uh this one at the soft decision |
---|
0:02:13 | based noise suppression strategy |
---|
0:02:15 | across different phone classes |
---|
0:02:17 | it was a very nice |
---|
0:02:18 | a fact approach |
---|
0:02:20 | um one of my former students lot are slim uh uh we had a paper and transactions and nine nine |
---|
0:02:25 | that looked at a hidden markov model based strategy to kind of classify |
---|
0:02:29 | uh a different phone class |
---|
0:02:30 | adapt uh it or to of uh are all S P |
---|
0:02:35 | a strategy |
---|
0:02:36 | a different phone class |
---|
0:02:37 | found that work |
---|
0:02:38 | well |
---|
0:02:39 | uh |
---|
0:02:41 | minutes so are former students a uh in our interspeech two thousand seven paper this class constraint over strategy |
---|
0:02:48 | and here again what we focused on was to try and work at |
---|
0:02:51 | uh extracting a pieces of enhanced speech from different types of |
---|
0:02:56 | uh a constrained representation |
---|
0:02:58 | see whether we can give |
---|
0:03:00 | the overall enhancement |
---|
0:03:01 | uh solution |
---|
0:03:02 | so |
---|
0:03:03 | this uh figure here kind shows the uh |
---|
0:03:06 | the strategy here i'll try to illustrate this the |
---|
0:03:09 | pointer |
---|
0:03:10 | so |
---|
0:03:11 | the uh |
---|
0:03:13 | and has been strategy this was a an older version from from interspeech two thousand seven |
---|
0:03:18 | we had a list the and it or to constrain or chip enhanced method |
---|
0:03:22 | and that since we kind of the a number of different enhancement solutions here |
---|
0:03:26 | uh for the input speech you basically try a whole uh a range of different to |
---|
0:03:30 | approaches approach in and if you look on the right here |
---|
0:03:32 | you kind of think of this is starting off with a single in hand are single degraded speech waveform |
---|
0:03:38 | i mean what you end up with is actually a very very large collection of enhanced waveforms |
---|
0:03:43 | and can kind of think of this is a very large break for the lack of |
---|
0:03:46 | anything thing else |
---|
0:03:47 | and the time domain is a got going along here and a cost each of these levels here |
---|
0:03:52 | and across this space of here |
---|
0:03:54 | could be a a very variations of different parameters that are controlled |
---|
0:03:59 | by the enhancement strategy and so |
---|
0:04:01 | and S since you end up with a a a a very large collection of enhanced waveforms |
---|
0:04:06 | then be approach is to try and C could you use a strategy like a a a gaussian mixture model |
---|
0:04:11 | approach to maybe go through and select |
---|
0:04:13 | which phone classes you actually have |
---|
0:04:15 | and identify which uh blocks actually would be most improved |
---|
0:04:20 | uh |
---|
0:04:21 | for that particular enhancements solutions up here |
---|
0:04:24 | uh based on the phone class |
---|
0:04:25 | so doing |
---|
0:04:27 | we use what's called a a a a what we're solution this is something that's very common in the speech |
---|
0:04:31 | recognition community |
---|
0:04:32 | and so what's done is |
---|
0:04:34 | looking at this big block to kind of going pick out you know with a particular |
---|
0:04:38 | enhancement configuration this particular piece and drop it down here |
---|
0:04:42 | the cup an X drop it here in since you're kind of looking across all the different domains |
---|
0:04:46 | in piecing them together uh |
---|
0:04:48 | hopefully coming up with a nice |
---|
0:04:50 | sequence of optimize enhanced uh |
---|
0:04:53 | blocks the for each of the different phone classes or a phone sequences |
---|
0:04:57 | um |
---|
0:04:58 | and hopefully the overall enhanced a signal is actually better |
---|
0:05:02 | so |
---|
0:05:03 | that |
---|
0:05:04 | just to a kind of a concept you when we look at a traditional uh and the this like an |
---|
0:05:09 | mse or spectral subtraction and these |
---|
0:05:12 | or just |
---|
0:05:13 | uh what |
---|
0:05:13 | typically we would they what people typically see |
---|
0:05:17 | uh is that uh you have maybe several classes of phones maybe this could be class one quest to quest |
---|
0:05:22 | three |
---|
0:05:23 | and these be different types of phone class one would argue that |
---|
0:05:27 | maybe a particular enhancement the like mmse |
---|
0:05:30 | if you can kind of to it properly |
---|
0:05:32 | tries to kinda give you |
---|
0:05:34 | good sounding speech across all the classes for one configuration |
---|
0:05:38 | and in a sense what you'd like to try and use to migrate this solution over |
---|
0:05:42 | specifically to this type of class maybe to the centroid of this space |
---|
0:05:46 | yeah with the idea of it now this is been optimized to this particular phone class and |
---|
0:05:51 | the results are force if you |
---|
0:05:53 | pick up each of these |
---|
0:05:54 | particular centroid |
---|
0:05:55 | you'd end up with a better overall solution than simply |
---|
0:05:58 | uh keeping the enhancement strategy constant for the whole way |
---|
0:06:03 | so the approach that we look for here to kind of |
---|
0:06:06 | uh use an alternative approach to |
---|
0:06:09 | uh the generalized spectral subtraction strategy |
---|
0:06:12 | uh and that is to to look at a weighted a euclidean |
---|
0:06:16 | we did clean |
---|
0:06:17 | a distortion |
---|
0:06:18 | and we believe this might be a better uh |
---|
0:06:21 | measure then then using uh |
---|
0:06:24 | uh i mean square error because we feel that that this |
---|
0:06:26 | would have a little but more perceptual based criteria incorporated into it |
---|
0:06:31 | so the idea is you have |
---|
0:06:34 | oh this vector of of uh |
---|
0:06:36 | of harmonic uh coefficients here |
---|
0:06:38 | uh from an fft |
---|
0:06:40 | and what we can do is to emphasise the errors during the val let's say by |
---|
0:06:44 | decreasing this pay a term uh that you're have in the representations of this is less a well we can |
---|
0:06:49 | assume for example that |
---|
0:06:51 | uh when B |
---|
0:06:52 | uh uh when you're in the ballot use that to the B magnitude of a X would be less than |
---|
0:06:57 | one |
---|
0:06:58 | so if you allow beta to be small i actually |
---|
0:07:02 | "'cause" this estimator here to increase |
---|
0:07:04 | um |
---|
0:07:05 | on the other hand if you're in a spectral peaks |
---|
0:07:08 | let's say during voiced of blocks uh X will be greater than one uh |
---|
0:07:12 | particular frequency harmonic |
---|
0:07:15 | and then this estimator term actually uh allowing |
---|
0:07:18 | uh |
---|
0:07:19 | uh the value of beta to be greater than zero |
---|
0:07:22 | well allow this term to actually inc |
---|
0:07:24 | and so |
---|
0:07:24 | that allows us to kind of adapt a |
---|
0:07:26 | a parametric way |
---|
0:07:28 | uh be enhancement solution |
---|
0:07:30 | so |
---|
0:07:31 | we approach that we've gonna look for is to kind of for use the |
---|
0:07:35 | uh |
---|
0:07:35 | a generalized spectral subtraction approach and and this was a introduced by |
---|
0:07:40 | uh same tone chang and tan in their transactions paper or ninety i U |
---|
0:07:45 | uh in this is the uh the estimator here it it basically finds the |
---|
0:07:49 | uh |
---|
0:07:50 | the best to estimate uh |
---|
0:07:52 | between |
---|
0:07:53 | uh a the term X of had an X uh |
---|
0:07:56 | uh from the original degraded speech signal |
---|
0:07:59 | uh the to uh components that we see here the a in |
---|
0:08:03 | B terms here the are frequency dependent weighting coefficients that need to be estimated |
---|
0:08:07 | any the for term here's a kind of the spectrum of exponent that you would see |
---|
0:08:11 | uh a the terms and here |
---|
0:08:13 | okay |
---|
0:08:13 | so what we'd like to do is to be able to optimize the a be terms here |
---|
0:08:17 | and in so doing in the |
---|
0:08:19 | for the general spectral subtraction approaches is basically to minimize the mean square error |
---|
0:08:24 | termed to see here so |
---|
0:08:25 | uh there is two solutions that uh uh that come up with one |
---|
0:08:29 | referred to as the unconstrained approach |
---|
0:08:31 | basically means that the a be terms of not equal to each other |
---|
0:08:34 | and the constrained approach which basically means that |
---|
0:08:37 | the two terms are in fact |
---|
0:08:39 | each |
---|
0:08:40 | so how do we are approach hours well what we're going to do is to |
---|
0:08:45 | uh work at minimizing uh |
---|
0:08:48 | optimising the terms uh in be subject to the a weighted euclidean distortion |
---|
0:08:53 | so doing |
---|
0:08:54 | we end up with these particular solutions for the in be terms |
---|
0:08:58 | uh we can then take these estimates of |
---|
0:09:00 | uh |
---|
0:09:01 | a B |
---|
0:09:02 | and such to them back into the a generalized spectral subtraction approach |
---|
0:09:06 | and form a new parametric estimators that we've fuel |
---|
0:09:10 | at offer some greater flexibility for enhance |
---|
0:09:13 | uh i just as a side note the minimum mean square error uh optimize coefficients are really just a special |
---|
0:09:19 | case of this weighted euclidean distortion approach |
---|
0:09:21 | when you lower out of beta T equal zero it actually |
---|
0:09:24 | falls back to the |
---|
0:09:26 | a a previous solution |
---|
0:09:28 | so this is is kind of a |
---|
0:09:31 | a busy plot but i will try to |
---|
0:09:33 | i like |
---|
0:09:34 | piece is |
---|
0:09:35 | i here here we're first looking at uh at fixing i'll and along the beta term to decrease |
---|
0:09:40 | and on this side were allowing L in increasing keeping beta fit |
---|
0:09:44 | basically there four quadrants here one is |
---|
0:09:46 | when there's speech region |
---|
0:09:48 | a you see this up here |
---|
0:09:49 | uh obviously this times to be the case where you have a fine high speech information and so you'd like |
---|
0:09:55 | to |
---|
0:09:56 | obviously try to suppress some of the noise but you don't want really touch or or damaged the speech signal |
---|
0:10:00 | as much |
---|
0:10:01 | um |
---|
0:10:02 | the second region here a a Q two is actually be |
---|
0:10:06 | unlike like be region it's these spots you |
---|
0:10:08 | respect to be operating in |
---|
0:10:10 | Q three is a noise only region and in this part here you really would like to actually have |
---|
0:10:14 | a a greater suppression and if you look at the beta based uh a constrained and train solutions |
---|
0:10:19 | we actually have a greater suppression |
---|
0:10:21 | a gain on the side so that's actually desirable to have |
---|
0:10:24 | and and is quite and for this is actually the case where you typically see um |
---|
0:10:28 | uh |
---|
0:10:29 | side harmonics that are popping up |
---|
0:10:31 | and this is actually the most dangerous area of the "'cause" |
---|
0:10:35 | in this part here i really would like to have suppression but |
---|
0:10:38 | i you really would like to ensure that you don't have to uh a musical tone artifacts that might be |
---|
0:10:43 | popping |
---|
0:10:43 | so |
---|
0:10:44 | this region is spot that you like to kind of sure |
---|
0:10:47 | i will have good perform |
---|
0:10:49 | so there are quite a few different enhancement methods that work can be comparing here are all try to highlight |
---|
0:10:54 | there was and not this slide but the next slide |
---|
0:10:56 | we were be going through a rover type solution and this |
---|
0:11:00 | i using what's called a mix mac |
---|
0:11:02 | uh a solution and this is actually L match this is actually coming from |
---|
0:11:06 | uh transactions paper from uh not as |
---|
0:11:09 | uh they david |
---|
0:11:10 | uh |
---|
0:11:11 | david |
---|
0:11:12 | a how movement michael but chaney back in eighty nine for speech recognition |
---|
0:11:16 | so i approach here basically we assume we have to great speech um |
---|
0:11:20 | we didn't going to have three estimators one |
---|
0:11:22 | and that we believe is a good estimator for sonorants one a good estimator for option |
---|
0:11:28 | an another one which we believe would be good for silence |
---|
0:11:31 | uh if we have a high energy we assume it's sonorants we know we're gonna kind i'm move four with |
---|
0:11:35 | that |
---|
0:11:36 | if it's a trained we it may or may not be a i'm noise and so we are up by |
---|
0:11:41 | a voice activity detector here |
---|
0:11:42 | if is in fact a a a a a uh |
---|
0:11:45 | no i then we we'd like to do is to kind of move down an update or noise reference characteristics |
---|
0:11:50 | here |
---|
0:11:50 | uh if it is in fact a a speech then we're gonna just use this uh in our model |
---|
0:11:55 | so |
---|
0:11:56 | uh we pull of the mfcc coefficients |
---|
0:11:58 | these are used primarily simply for that |
---|
0:12:00 | gaussian mixture models here |
---|
0:12:02 | these are basically to try trying classify whether were sitting and a show and sonorants so and silence blocks |
---|
0:12:08 | once we have this knowledge we feed this into the mix maxed type |
---|
0:12:11 | uh solution and what this does is it sets maximum likelihood |
---|
0:12:15 | uh weights that we can then used to weight |
---|
0:12:18 | uh the solutions from the sonorant a constraint |
---|
0:12:20 | and |
---|
0:12:21 | noise based estimators that we see a lot here |
---|
0:12:23 | hopefully coming up with |
---|
0:12:25 | integrated |
---|
0:12:26 | a solution that will sound better than |
---|
0:12:28 | and you the individual uh a solution |
---|
0:12:31 | so the categories as i said there are three broad phone class of a class types here sonorants obstruents and |
---|
0:12:37 | silence |
---|
0:12:38 | we group what we believe to be the |
---|
0:12:39 | the fricatives are for kids and stops |
---|
0:12:41 | the option |
---|
0:12:43 | um |
---|
0:12:44 | again we're doing this some kind of an unsupervised manner uh |
---|
0:12:47 | over time so what we believe the stops are actually finding a way in the actual the |
---|
0:12:52 | in fact |
---|
0:12:53 | move into the silence |
---|
0:12:54 | um again the uh and the parametric beta estimators are there were using |
---|
0:12:59 | or gonna to knows the each of the |
---|
0:13:01 | a broad phone classes |
---|
0:13:03 | uh for sonorants in a trend |
---|
0:13:06 | now the outputs from these estimators and convert mfccs and then the decision weights here kind of used |
---|
0:13:12 | uh to make a soft combine uh wait for each of the composite utterance utterances |
---|
0:13:17 | uh similar to the rover solution that weight |
---|
0:13:19 | back in in speech are seven |
---|
0:13:21 | fine like the noisy speech can be modelled using this uh mix max type |
---|
0:13:25 | uh model |
---|
0:13:27 | uh |
---|
0:13:27 | is also incorporate |
---|
0:13:28 | classification for the silence and the house |
---|
0:13:32 | um um in this mix max |
---|
0:13:33 | model model uh the gmms uh indicate we need to have to one for the |
---|
0:13:38 | sonorants one for the utterance |
---|
0:13:40 | uh so we have a set number of mixtures |
---|
0:13:42 | components that are used to estimate the |
---|
0:13:45 | for the silence were we're using right now she's one mixture of course if you have multiple noise types you |
---|
0:13:49 | can |
---|
0:13:50 | more than |
---|
0:13:50 | one mixture care that |
---|
0:13:52 | um |
---|
0:13:53 | in the mixed next um model uh as i pointed out now as uh they've number um of michael but |
---|
0:13:58 | any |
---|
0:13:59 | had had this uh idea of for uh modeling noise characteristics |
---|
0:14:04 | uh for speech recognition in nine we're using here so that the track noise structure |
---|
0:14:10 | next there a look at the enhancement our the experimental a up here a uh we use results from thirty |
---|
0:14:16 | two a individual sentences from timit |
---|
0:14:19 | a the metrics we use was |
---|
0:14:21 | uh a segmental signal to noise ratio and itakura-saito distortion |
---|
0:14:25 | results um some gonna show here just the other course a you know of sorry the a segmental snr |
---|
0:14:30 | the paper has all the results from a |
---|
0:14:32 | or C |
---|
0:14:33 | as well |
---|
0:14:34 | uh the gmms trained we used a three in tokens with sixteen mixtures |
---|
0:14:38 | and for the silence model |
---|
0:14:39 | uh just a single mixture |
---|
0:14:41 | and for the noise types we have two types uh a flat communications channel noise that uh we had from |
---|
0:14:47 | an eighteen T voice channel |
---|
0:14:49 | um |
---|
0:14:50 | and a large crowd noise so this multiple people speaking but not babble it's kind of a broader |
---|
0:14:55 | a noise i |
---|
0:14:57 | and uh there are quite a few different enhancement strategies the standard uh a from a line and sc C |
---|
0:15:03 | uh mmse |
---|
0:15:04 | uh the joint map scheme from |
---|
0:15:07 | a a patch of able from simon got soul |
---|
0:15:09 | uh from their paper and uh |
---|
0:15:12 | two thousand |
---|
0:15:13 | nine i believe |
---|
0:15:15 | um |
---|
0:15:15 | same paper on the generalized to a spectral subtraction the unconstrained approach were ain't be uh in B terms don't |
---|
0:15:21 | have to be equal to each other and then constraint scheme where they do have to be equal to each |
---|
0:15:25 | other |
---|
0:15:26 | a parametric approach is |
---|
0:15:27 | for |
---|
0:15:28 | uh a weighted euclidean just a uh a distortion based approach |
---|
0:15:32 | uh four icassp paper last year we had a chi-square prior for the |
---|
0:15:37 | uh for the amplitudes on the scheme and that was reported last year |
---|
0:15:41 | and we also |
---|
0:15:43 | a chi-square |
---|
0:15:44 | prior for the um |
---|
0:15:45 | but J map solutions of this we have |
---|
0:15:47 | a a of last year |
---|
0:15:49 | uh for what we're doing this year with the rover approach a of this has the rover based uh |
---|
0:15:54 | corporation of the weighted clean distortion |
---|
0:15:57 | chi |
---|
0:15:58 | priors |
---|
0:15:59 | uh and same for the J map type solution |
---|
0:16:02 | and then we take a the beta on constrained and beta constrained approach here and also feed |
---|
0:16:06 | a and world of a solution so |
---|
0:16:08 | and has since we have |
---|
0:16:09 | a to different enhanced and that that's uh |
---|
0:16:12 | well really wanna |
---|
0:16:13 | a benchmark a baseline against the parametric against |
---|
0:16:18 | so uh this uh uh a to shows the uh sec well signal noise ratio increase of this is actually |
---|
0:16:25 | any positive value here shows a |
---|
0:16:28 | in |
---|
0:16:28 | an improvement in segmental signal-to-noise ratio |
---|
0:16:31 | so uh which G you term here this is basically the generalized spectral subtraction approach which is a baseline scheme |
---|
0:16:37 | from sin |
---|
0:16:38 | a a paper |
---|
0:16:40 | and you can see the uh |
---|
0:16:41 | proof been here on the sonorants are quite good |
---|
0:16:44 | an improvement on obstruents and silence or not as good and the overall the kind of |
---|
0:16:49 | right |
---|
0:16:50 | but |
---|
0:16:50 | like to see |
---|
0:16:51 | now each of these three are actually optimized for sonorants obstruents |
---|
0:16:56 | and uh the noise types so |
---|
0:16:58 | what we do is we search across all the possible configurations |
---|
0:17:02 | uh for the terms here we find it best can figure it |
---|
0:17:05 | best configuration |
---|
0:17:06 | uh for the sonorants |
---|
0:17:08 | and that's the best improvement that we |
---|
0:17:09 | yet |
---|
0:17:10 | um |
---|
0:17:11 | at the same time for the ops to C C |
---|
0:17:14 | uh this simple is actually quite a |
---|
0:17:16 | is there |
---|
0:17:17 | and silence |
---|
0:17:17 | is not so bad uh |
---|
0:17:20 | like it to be |
---|
0:17:21 | um but if you look across the diagonal here see for the sonorants the ops rents |
---|
0:17:25 | and the noise we when we optimize this we actually get |
---|
0:17:29 | a nice improvement across here better than what we would have gotten |
---|
0:17:32 | with the uh sims approach uh a we at this council |
---|
0:17:36 | cross |
---|
0:17:37 | phase |
---|
0:17:38 | now the goal then is to try and figure out how quite kind of take the best from each of |
---|
0:17:41 | these |
---|
0:17:41 | and together |
---|
0:17:43 | so this approach here's a lower based uh a solution at this does not use the exact |
---|
0:17:48 | optimum solutions here it actually goes and finds what it thinks is the best approach |
---|
0:17:52 | based on that fine and classifier so |
---|
0:17:55 | this is kind of what you would expect to see performance wise |
---|
0:17:58 | if it's free running not knowing what the what the best performance |
---|
0:18:01 | and you can see |
---|
0:18:02 | improvement in sec mel signal noise ratio is quite nice |
---|
0:18:06 | uh both for sonorants obstruents and silence |
---|
0:18:08 | a not too bad and the overall |
---|
0:18:12 | track of time here some just |
---|
0:18:14 | a quickly here |
---|
0:18:15 | these sure signal signal to noise ratio increases for flat communications channel noise |
---|
0:18:19 | across all the different uh |
---|
0:18:21 | uh noise types |
---|
0:18:23 | uh or noise levels some sorry |
---|
0:18:25 | in the main solution to kind of see from here is that |
---|
0:18:27 | these uh approaches down here are the on rover to approach and use of the rover solutions that |
---|
0:18:33 | and a combining them in a nice automatic way uh |
---|
0:18:36 | uh allows you to can get better performance for the flat communications channel noise |
---|
0:18:41 | and likewise for the a large crowd noise you can see the performance here is quite nice |
---|
0:18:46 | um |
---|
0:18:47 | for the ops true joints for the sonorants here |
---|
0:18:49 | also as well and and you combine them there actually a much much better than |
---|
0:18:53 | the jewel |
---|
0:18:55 | uh so |
---|
0:18:56 | if you're kind of looking what out of these maybe sixteen or so different |
---|
0:19:00 | enhancement strategies |
---|
0:19:01 | what of the best ones of these indicate the first and second best |
---|
0:19:05 | can spent strategy |
---|
0:19:06 | you can pretty much see across all of our evaluations here that the row based solutions are we |
---|
0:19:11 | uh a quite well |
---|
0:19:13 | uh the beta bait the beta or uh the parametric beta scheme |
---|
0:19:17 | uh for the general spectral subtraction was also a |
---|
0:19:20 | uh a good to a candidate there M and the J map uh version was also |
---|
0:19:25 | a successful mean |
---|
0:19:26 | and |
---|
0:19:28 | so are can in conclusion we for considered uh to parametric uh |
---|
0:19:32 | a generalized uh |
---|
0:19:34 | spectral subtraction approach here |
---|
0:19:36 | uh |
---|
0:19:37 | he's parametric estimators can be preaching and for the different phone classes |
---|
0:19:41 | uh a name been may not perform a small across all the phone classes |
---|
0:19:45 | uh incorporating a rover paradigm a large to pick off some of the better |
---|
0:19:49 | a segment |
---|
0:19:51 | one together for an overall enhanced uh approach |
---|
0:19:54 | and uh we looked to these estimators across uh individual um |
---|
0:19:58 | uh |
---|
0:19:59 | uh |
---|
0:20:01 | we we |
---|
0:20:01 | compare them against the individual estimators uh |
---|
0:20:05 | without having a rover solution and found that their combinations improve performance for flat communications channel voice large crowd noise |
---|
0:20:12 | or over different signal |
---|
0:20:13 | to noise ratio |
---|
0:20:16 | john film thank you very much |
---|
0:20:22 | so uh any questions |
---|
0:20:27 | the i was the better is constant in each group |
---|
0:20:30 | the often all the frequency |
---|
0:20:32 | uh they are constant of what happens is that when we uh uh there that but they can be different |
---|
0:20:38 | for each of the classes so rents uh obstruents |
---|
0:20:42 | uh silence they can be different for those |
---|
0:20:45 | so it's kind of when you when we look at a prior or um |
---|
0:20:49 | a rover solution the prior over solution we actually had many many more classes here we only kind of looking |
---|
0:20:55 | at three |
---|
0:20:56 | so we allow kind of some flexibility you can generalise it a more class |
---|
0:21:00 | three |
---|
0:21:01 | and and how robust is with respect to |
---|
0:21:04 | um |
---|
0:21:05 | misclassification |
---|
0:21:06 | yeah that's a good question so |
---|
0:21:08 | um we have we are running a test where we intentionally putting in |
---|
0:21:12 | five five and ten percent errors in |
---|
0:21:14 | you're less likely to have an error between of sonorant two |
---|
0:21:17 | and after right but you're more likely to have |
---|
0:21:19 | and error between a string |
---|
0:21:21 | and |
---|
0:21:21 | sign |
---|
0:21:22 | a so that was the issue should be when i one out to the stops there's the stop sometimes it |
---|
0:21:27 | the leading or training stops |
---|
0:21:28 | it ten to get cold and you go into the silence side or the other side so you have to |
---|
0:21:32 | much suppression |
---|
0:21:35 | further comments |
---|
0:21:37 | so thank you once more |
---|