0:00:17 | oh my name is in like was though and will be talking about |
---|
0:00:21 | oh |
---|
0:00:22 | and that affect in the |
---|
0:00:24 | T scroll database that |
---|
0:00:25 | captures a large vocabulary content |
---|
0:00:28 | and |
---|
0:00:29 | i will be talking about |
---|
0:00:30 | how the one but fact |
---|
0:00:33 | i speech parameters |
---|
0:00:35 | and continuous speech are and how it affects S a |
---|
0:00:39 | oh the presentation we will have three parts |
---|
0:00:42 | the first part are i will introduce the the school database |
---|
0:00:45 | and present and the results of the are we cease of the speech parameters and |
---|
0:00:49 | a tree could be really content |
---|
0:00:51 | the second part uh i a propose |
---|
0:00:54 | modified right version of the rest of that during which is very popular in |
---|
0:00:58 | S S and i have also |
---|
0:01:01 | in some kind of |
---|
0:01:02 | combination of this modified to rest of it |
---|
0:01:05 | i don't normalization be proposed in |
---|
0:01:08 | i "'cause" two thousand nine |
---|
0:01:11 | it's quite Q C N and finally |
---|
0:01:13 | i will present a volition of the relations of these |
---|
0:01:17 | a a side by side it's other uh a cepstral normalization |
---|
0:01:22 | so first what just some effect |
---|
0:01:24 | uh i have it refers to the phenomenon and |
---|
0:01:27 | speak in noisy conditions and so they try to maintain |
---|
0:01:31 | uh intelligible communication |
---|
0:01:33 | so they we increase the vocal part and they do lot of other thing |
---|
0:01:38 | are are people who understand them |
---|
0:01:40 | uh |
---|
0:01:42 | but the fact is strip like that and number of parameters like a go for a page |
---|
0:01:47 | i month frequency system push it can their locations |
---|
0:01:50 | spectral slope changes and there are other variations we cannot so |
---|
0:01:55 | oh this affects although little S because the |
---|
0:01:58 | acoustic models P are usually using a are typically trained on new to speech |
---|
0:02:03 | so one of these um variations and speech parameters |
---|
0:02:07 | or some kind of mismatch between the acoustic models and the incoming features |
---|
0:02:12 | oh the previous studies |
---|
0:02:14 | oh that look that |
---|
0:02:15 | i bart the fact in the context of a are they usually focus on |
---|
0:02:19 | a a a a small a be the task |
---|
0:02:21 | so |
---|
0:02:22 | i and this is kind of contribution of the study the |
---|
0:02:24 | a very look how the |
---|
0:02:26 | and that affect affects large vocabulary asr |
---|
0:02:29 | a kind of a mental bill talk is because it's |
---|
0:02:32 | and to make that speech so i mean |
---|
0:02:35 | and that large vocabulary |
---|
0:02:37 | but |
---|
0:02:39 | so first i would like to uh |
---|
0:02:41 | and use the ut scope database |
---|
0:02:44 | i the database |
---|
0:02:45 | uh colours |
---|
0:02:47 | speech under cognitive and physical stress emotion motion someone but the fact |
---|
0:02:51 | you would be just looking at the one but a portion of the data |
---|
0:02:55 | a it contains fifty eight subjects |
---|
0:02:57 | uh uh of those are they wanna a native speakers of us english |
---|
0:03:01 | and if five female six males |
---|
0:03:04 | and we are using just the native speakers in this study so we would only minute it or does the |
---|
0:03:09 | effect of |
---|
0:03:10 | oh funding X and |
---|
0:03:13 | uh the database context |
---|
0:03:15 | a a each each subject |
---|
0:03:16 | uh a new speech |
---|
0:03:18 | and C a speech for this in like that uh noisy conditions |
---|
0:03:23 | the what the case of the subject |
---|
0:03:25 | uh a are exposed to noise produced true |
---|
0:03:28 | that's found |
---|
0:03:29 | a but but you can still collect |
---|
0:03:31 | a relatively clean speech than a be high as and and the channel |
---|
0:03:35 | microphone channel |
---|
0:03:38 | i use three types of noise is in is the one but effect |
---|
0:03:41 | uh it's what |
---|
0:03:42 | was car noise |
---|
0:03:43 | that was record it |
---|
0:03:45 | or uh and driving on a highway sixty five |
---|
0:03:48 | a mouse but over |
---|
0:03:49 | and we have a large crowd noise and being noise |
---|
0:03:52 | a be produce the nicest to the subjects that of levels |
---|
0:03:56 | and the case of car and a large crowd the to seven the and ninety db is |
---|
0:04:00 | as L |
---|
0:04:01 | in the case of pink noise it was a all start the last sixty five to eighty five |
---|
0:04:06 | because |
---|
0:04:07 | the subjects kind of complaint that the |
---|
0:04:09 | missus disturbing them at those original and |
---|
0:04:13 | oh the speech was recorded in the summer wood |
---|
0:04:16 | are also |
---|
0:04:17 | then |
---|
0:04:18 | sure as high snr |
---|
0:04:20 | if three microphone channels strolled microphone close to and five kit like |
---|
0:04:24 | this study we are looking at the cost talk microphone because |
---|
0:04:27 | but whites |
---|
0:04:28 | a a high snr |
---|
0:04:29 | and |
---|
0:04:30 | i mean i like that's throat microphone that |
---|
0:04:33 | it's more broad event |
---|
0:04:35 | so the content of |
---|
0:04:36 | of the sessions |
---|
0:04:37 | for each speaker |
---|
0:04:39 | for the neutral in conditions where they didn't you know and the noise |
---|
0:04:43 | we would produce a hunter |
---|
0:04:45 | made like sentence east they read then |
---|
0:04:47 | and the noisy conditions they will treat better each scenario when some sentences |
---|
0:04:52 | a in tree |
---|
0:04:54 | three levels of noise |
---|
0:04:55 | also uh uh read digit string |
---|
0:04:58 | and there was also from from thing the speech are they will be |
---|
0:05:01 | in content of uh of a picture |
---|
0:05:04 | for the study we are using just the the made like sentence for several reasons |
---|
0:05:08 | i don't to the digit strings because |
---|
0:05:10 | and the french a very |
---|
0:05:12 | recognition and |
---|
0:05:13 | maybe be in the beginning to use language modeling |
---|
0:05:16 | so the digit strings |
---|
0:05:18 | we just maybe that |
---|
0:05:19 | and use the spontaneous speech because |
---|
0:05:22 | a it was kind of difficult to |
---|
0:05:24 | to make the subjects |
---|
0:05:25 | to like a natural so |
---|
0:05:27 | speech should be kind of abrupt and they would be laughing for there will be a long pulses |
---|
0:05:31 | to be kind of a hard to deal is |
---|
0:05:33 | this step of speech at this stage of the research so |
---|
0:05:36 | just not using it this this small |
---|
0:05:39 | a so in the in the speech production analysis part |
---|
0:05:44 | well you you will be analysing as an R |
---|
0:05:47 | second whoosh |
---|
0:05:48 | no sure |
---|
0:05:49 | oh we do this because it kind of relates of the vocal intensity |
---|
0:05:53 | since there uh |
---|
0:05:54 | uh surrounding background noise |
---|
0:05:56 | can be considered kind of |
---|
0:05:58 | can of can stomp in the sample |
---|
0:06:00 | could the changes in the vocal intensity the |
---|
0:06:03 | are directly reflected in the changes in this and R |
---|
0:06:06 | this so really don't need to know actually the up level or |
---|
0:06:09 | how the i'm a direct |
---|
0:06:11 | the signal good actually relates to |
---|
0:06:14 | out to the intensity because we can count of just the microphone gain uh |
---|
0:06:18 | during the recording so that would be a problem |
---|
0:06:21 | so use a uh |
---|
0:06:22 | me analyse uh |
---|
0:06:23 | zero or no rebel formant frequencies and duration |
---|
0:06:27 | and then we'll it look at cepstral distributions which is or a little bit far from |
---|
0:06:32 | a direct |
---|
0:06:33 | or or primarily a speech direction parameters but |
---|
0:06:35 | it's important for the is a later |
---|
0:06:38 | so we used a so for and some other tools to extract these parameters there's |
---|
0:06:43 | so |
---|
0:06:43 | uh the the first figure here uh is snr |
---|
0:06:47 | a continuous line is for |
---|
0:06:50 | speech or speech and there was no noise produce |
---|
0:06:53 | so you can see in this case the the mean this are is |
---|
0:06:57 | a always compare to all other conditions |
---|
0:07:00 | uh |
---|
0:07:00 | this figure is just |
---|
0:07:02 | oh showing |
---|
0:07:03 | the place for a highway noise so we have |
---|
0:07:06 | i mean a produce it's of and date in ninety db is |
---|
0:07:08 | we can see in |
---|
0:07:10 | increasing level of noise the snrs increasing that basically means that |
---|
0:07:15 | vocal intensity was increased |
---|
0:07:16 | in the subject |
---|
0:07:18 | it's kind of |
---|
0:07:19 | and into it if and that was reported by many previous to this from what effect |
---|
0:07:23 | so so look at |
---|
0:07:24 | sampling in one but function it should be basically |
---|
0:07:28 | are the relation between the noise level and the |
---|
0:07:31 | speech intensity |
---|
0:07:33 | a noise |
---|
0:07:33 | have a would be |
---|
0:07:35 | well the |
---|
0:07:37 | cindy Vs |
---|
0:07:38 | so in our case |
---|
0:07:39 | if we if use tradition lies |
---|
0:07:41 | france would be observing slopes |
---|
0:07:43 | i me to and zero to zero point three |
---|
0:07:46 | a a zero or |
---|
0:07:47 | but to |
---|
0:07:47 | me for pink noise |
---|
0:07:49 | the subjects that are uh make the kind of randomly |
---|
0:07:52 | and that are and crowd noise |
---|
0:07:54 | it just frame more consistent |
---|
0:07:55 | and the zero point stream that's or this in there was kind of typical |
---|
0:07:59 | as a scene |
---|
0:08:00 | in previous studies |
---|
0:08:02 | X thing that's fundamental frequency about |
---|
0:08:04 | uh i'm not showing and the distributions this |
---|
0:08:07 | this time |
---|
0:08:08 | and be the rather focusing on the since we have |
---|
0:08:11 | three levels of noise that gives as kind of chance to |
---|
0:08:14 | a a that the the correlation between the |
---|
0:08:18 | uh |
---|
0:08:18 | have a lot of the |
---|
0:08:19 | noise that |
---|
0:08:20 | the subjects are saying too |
---|
0:08:22 | and the changes in the mean as you know so you can see |
---|
0:08:26 | and the table there are |
---|
0:08:27 | i |
---|
0:08:28 | a rolls one is for females at and one for males |
---|
0:08:31 | i first to the slope of the regression line |
---|
0:08:34 | i spread this correlation coefficient as he |
---|
0:08:37 | a error |
---|
0:08:38 | so you can see for especially for highway and crowd noise |
---|
0:08:42 | a a correlation coefficient just really high it's very close to one |
---|
0:08:46 | well it's partly because use just the mean values of all the recordings in that type of |
---|
0:08:51 | a a a a in that level of noise |
---|
0:08:54 | but also you can see that the mean square errors are very low |
---|
0:08:57 | so there's is very strong mean a linear relationship between the presentation level |
---|
0:09:03 | and D is an actually |
---|
0:09:05 | a a F zero and hard |
---|
0:09:06 | you could see some previous past of these that would be |
---|
0:09:09 | in clean a relationship when the |
---|
0:09:12 | and here would be also in work scale it would be in some it on but here actually for us |
---|
0:09:16 | it's |
---|
0:09:18 | a mean scale |
---|
0:09:20 | a when when you are looking at the |
---|
0:09:24 | a month we can see so we are looking at the F one |
---|
0:09:27 | i two space |
---|
0:09:28 | vol |
---|
0:09:30 | i and the company is line will be referring to the new speech |
---|
0:09:33 | and the other ones would be for a highway noise someone to ninety |
---|
0:09:38 | we estimate the phone boundaries using force alignment |
---|
0:09:41 | so it it's not perfectly a period |
---|
0:09:44 | but |
---|
0:09:45 | there some or it could be it should be kind of consistent "'cause" the recordings that are process so |
---|
0:09:49 | if as some kind of in what is happening there |
---|
0:09:52 | uh are the the |
---|
0:09:54 | error bars are actually the standard deviation intervals |
---|
0:09:58 | so you can see there's some kind of |
---|
0:10:00 | very consistent shift in the |
---|
0:10:02 | from the rebels space here |
---|
0:10:04 | is the level of noise |
---|
0:10:06 | and we're looking at the level duration |
---|
0:10:09 | a can be use force alignment |
---|
0:10:10 | to to estimate the boundaries of the vowels |
---|
0:10:13 | so some previous studies reported that uh some there would be some time construction or |
---|
0:10:18 | expansion for different uh form classes |
---|
0:10:22 | sort something similar you see for some of was there be some slight reduction |
---|
0:10:26 | is the level of increasing level of noise but most there the |
---|
0:10:30 | that was them to be problem |
---|
0:10:31 | unfortunately fortunately given the amount of data here |
---|
0:10:34 | a |
---|
0:10:35 | and finance intervals are quite right so |
---|
0:10:38 | and two D C kind of consistent trends here |
---|
0:10:41 | uh the changes are not statistically significant so we can make |
---|
0:10:45 | and and they it conclusions of to this |
---|
0:10:48 | and mouse is finally you are looking at |
---|
0:10:50 | that's distributions |
---|
0:10:52 | uh |
---|
0:10:53 | and get us kind of a how the |
---|
0:10:55 | acoustic stick model |
---|
0:10:57 | be affected told what kind of mismatch you can expect that |
---|
0:11:00 | so here i'm also putting the |
---|
0:11:02 | just so lead line here is for the timit train a a a a a bit that that we were |
---|
0:11:07 | using quite there for training the |
---|
0:11:09 | rules |
---|
0:11:10 | the other one so are for the U T school conditions |
---|
0:11:13 | and you can see there's a |
---|
0:11:14 | a mismatch you look at C zero which kind of represents presents the local energy |
---|
0:11:19 | C one that reflects kind of spectral still |
---|
0:11:22 | there are a big differences |
---|
0:11:24 | uh in the |
---|
0:11:25 | distribution |
---|
0:11:26 | so |
---|
0:11:27 | we can exploit this will affect the a side in negative way |
---|
0:11:31 | oh so |
---|
0:11:32 | oh i would like to move phone and describe the |
---|
0:11:36 | but the factor stuff of there we are proposing |
---|
0:11:38 | so we stays very popular |
---|
0:11:41 | oh |
---|
0:11:42 | a magician method |
---|
0:11:44 | we |
---|
0:11:45 | it's used either on long walk |
---|
0:11:46 | a uh and that he's or it can be used in cepstral domain to is basically the same thing |
---|
0:11:51 | it's a bandpass filtering and |
---|
0:11:53 | a start basically a process |
---|
0:11:55 | a build very slow |
---|
0:11:57 | else slowly varying uh signal components and really of fast varying caps O |
---|
0:12:02 | signal components |
---|
0:12:03 | belief are kind of and it |
---|
0:12:06 | a speech |
---|
0:12:07 | and it has been shown to |
---|
0:12:09 | oh increase robustness and noise |
---|
0:12:11 | channel mismatch |
---|
0:12:12 | and so in a a a a a variation |
---|
0:12:16 | but i sign speaker I |
---|
0:12:18 | uh |
---|
0:12:19 | but one or the slide but work of the original rasta filter is |
---|
0:12:23 | that's |
---|
0:12:24 | it's a are very zero |
---|
0:12:25 | a a kind of a or there because we we want to have |
---|
0:12:29 | and spells |
---|
0:12:29 | so we as also introduce a some kind of transient and distortion |
---|
0:12:34 | a a in time domain because if there are some rubber |
---|
0:12:37 | abrupt changes and the |
---|
0:12:39 | and a general signal |
---|
0:12:41 | i take some time |
---|
0:12:43 | the the the right |
---|
0:12:44 | settle down |
---|
0:12:45 | so we try to |
---|
0:12:47 | a like us to that we need try to improve it a little bit |
---|
0:12:51 | so |
---|
0:12:52 | we you you really you can |
---|
0:12:54 | but are also there |
---|
0:12:56 | right by two separate blocks |
---|
0:12:57 | but is what would be |
---|
0:12:59 | first so mean normalization that till |
---|
0:13:01 | and that's we help us get rid of the dc second one |
---|
0:13:04 | much of the scroll in components it's also pairs that depends on the length of the window |
---|
0:13:09 | of the |
---|
0:13:10 | and no segment or or of the window but |
---|
0:13:13 | dc component to be definitely on |
---|
0:13:15 | and maybe that's just fine |
---|
0:13:17 | and then we |
---|
0:13:18 | then B |
---|
0:13:19 | a second one could be a low pass filter |
---|
0:13:22 | that's will be suppressing the |
---|
0:13:23 | fast |
---|
0:13:24 | a a change changes in the signal |
---|
0:13:26 | this way the the low pass filter can be a very well all or there |
---|
0:13:29 | and can be kind of nice this smooth side will show |
---|
0:13:32 | the next slide |
---|
0:13:34 | ah |
---|
0:13:37 | as all this kind of scheme a what's cells |
---|
0:13:40 | so the chance to replace the |
---|
0:13:42 | dc C separation |
---|
0:13:44 | uh |
---|
0:13:45 | by some more sophisticated uh |
---|
0:13:47 | distribution normalization that to that |
---|
0:13:49 | not necessarily |
---|
0:13:51 | normalize a sphinx to their means like the |
---|
0:13:53 | um |
---|
0:13:54 | or a minimization |
---|
0:13:56 | so |
---|
0:13:56 | you to in this figure we can see the original or a band pass filter |
---|
0:14:00 | as a solid line |
---|
0:14:02 | and also the newly proposed filter that the dashed fine |
---|
0:14:04 | but just what pass |
---|
0:14:06 | so you you see it kind of uh or eliminates the residual |
---|
0:14:10 | cycle |
---|
0:14:11 | and the height of frequencies that we can see "'em" original rasta |
---|
0:14:15 | and here's example |
---|
0:14:17 | you you uh the |
---|
0:14:19 | first figure that to prosper and |
---|
0:14:22 | would be |
---|
0:14:23 | or all C zero from an if she's |
---|
0:14:25 | some kind of example |
---|
0:14:26 | and the |
---|
0:14:27 | but the total bill would be |
---|
0:14:29 | a the rest of was to apply to the caesar or C zero track |
---|
0:14:33 | see there some kind of very strong transients |
---|
0:14:36 | at some stages |
---|
0:14:37 | and size by the dashed line and |
---|
0:14:39 | but are one is when we combine some uh |
---|
0:14:42 | some |
---|
0:14:43 | minimization in |
---|
0:14:45 | you C and is the newly proposed a pass filter |
---|
0:14:48 | you see also of the transient effects are gone |
---|
0:14:52 | which will be like nice |
---|
0:14:54 | so now |
---|
0:14:56 | we can this a newly proposed a a low pass |
---|
0:14:59 | filter |
---|
0:15:00 | yes |
---|
0:15:00 | our compensation that of this called you see and |
---|
0:15:03 | and tell based cepstra |
---|
0:15:05 | and the mixed |
---|
0:15:06 | normalization |
---|
0:15:07 | uh |
---|
0:15:09 | spread |
---|
0:15:10 | i is kind of similar like cepstral mean variance normalization but |
---|
0:15:13 | we observe that if you have a noise signal |
---|
0:15:16 | or if you're from what fact |
---|
0:15:18 | or the the you wanna |
---|
0:15:20 | the skewness of the distributions them to change |
---|
0:15:23 | here |
---|
0:15:23 | distributions that that kind of the current skewness |
---|
0:15:26 | then a whining them by their mean |
---|
0:15:29 | a maybe not very often because the dynamic range are you with that |
---|
0:15:32 | very or what maybe find like that's a ninety percent of the samples |
---|
0:15:36 | i can be about aligned |
---|
0:15:37 | so what we do instead |
---|
0:15:39 | we we pick some one high one tiles to make them from the |
---|
0:15:43 | histograms we so let's say |
---|
0:15:46 | a five since a ninety five percent |
---|
0:15:48 | so we know this |
---|
0:15:49 | interval different bounds |
---|
0:15:50 | and into |
---|
0:15:51 | or of the samples |
---|
0:15:53 | and the a these intervals |
---|
0:15:55 | and set of mean and variance |
---|
0:15:56 | and we found than be shown in previous studies that |
---|
0:16:00 | it helps a lot |
---|
0:16:01 | uh uh special in one but effect and |
---|
0:16:03 | noise at if |
---|
0:16:05 | so we will propose combining this instead of C and |
---|
0:16:08 | is the low pass stuff |
---|
0:16:10 | so finally |
---|
0:16:13 | i i will present the evolution so |
---|
0:16:15 | the system |
---|
0:16:16 | it was uh |
---|
0:16:17 | triphone hmms |
---|
0:16:18 | system i'm i'm was the rules |
---|
0:16:20 | mister store to mixtures |
---|
0:16:22 | and B were training the the models on clean timit |
---|
0:16:26 | we use a set language modeling to two it's for language modeling |
---|
0:16:31 | and |
---|
0:16:32 | because of "'cause" there's a mismatch |
---|
0:16:33 | channel mismatch be and microphone mismatch between timit and |
---|
0:16:37 | a data |
---|
0:16:38 | we we should we chose several sessions and use them for |
---|
0:16:42 | acoustic model adaptation so we use them a lot and i mean P |
---|
0:16:46 | and use these adaptations sessions of course and the evolution like to one |
---|
0:16:51 | so |
---|
0:16:51 | the in the oceans we had the neutral |
---|
0:16:55 | and and by speech |
---|
0:16:57 | but also of a clean signals was i and that |
---|
0:16:59 | and then you'll also makes those recordings is the |
---|
0:17:03 | a a is the car noise |
---|
0:17:04 | to see how how the methods will be robust and |
---|
0:17:07 | and to effect and and |
---|
0:17:10 | so the base and performance |
---|
0:17:12 | uh |
---|
0:17:13 | and you to test set |
---|
0:17:15 | i C C and |
---|
0:17:16 | and i to C D and |
---|
0:17:17 | but was like a person's what are rate and the P |
---|
0:17:21 | a similar so than we just the other of are are much |
---|
0:17:25 | uh |
---|
0:17:26 | you didn't use language modeling |
---|
0:17:28 | after after this because we want to just see |
---|
0:17:30 | i the acoustic models are affected |
---|
0:17:33 | i |
---|
0:17:33 | and that affect and and the noise |
---|
0:17:36 | and minimum to have a really strong language model that |
---|
0:17:38 | but these a little the right |
---|
0:17:40 | uh i mean the benefits of the individual normalization for job |
---|
0:17:44 | so this is just a a baseline a evolution |
---|
0:17:48 | and the C C V and system you C |
---|
0:17:50 | or a neutral speech of some |
---|
0:17:52 | based and performance and |
---|
0:17:54 | each uh a noise type |
---|
0:17:57 | hence we are increasing the |
---|
0:17:59 | noise level in the headphones |
---|
0:18:02 | uh the one but i think that |
---|
0:18:03 | stronger and also the is R |
---|
0:18:05 | to is that what they're |
---|
0:18:07 | grows |
---|
0:18:09 | just a that the recording are queen so in all cases here |
---|
0:18:12 | but high snr |
---|
0:18:14 | a so then you are comparing so |
---|
0:18:16 | i all or normalization that that's |
---|
0:18:18 | and i mean normalization but it's magician |
---|
0:18:21 | i to be normalization rasta stuff filtering |
---|
0:18:24 | you should have been was in addition |
---|
0:18:26 | histogram equalisation but we to the timit train data |
---|
0:18:29 | but distributions as the reference point and then we compare it to you C and the Q skinner stuff |
---|
0:18:34 | and this set the results |
---|
0:18:37 | also so uh the table the left and side |
---|
0:18:40 | uh |
---|
0:18:41 | shows the overall uh results across all conditions in clear mean recordings so |
---|
0:18:46 | or set the new to run one but once for no noise was a that |
---|
0:18:50 | i S R |
---|
0:18:51 | so you see |
---|
0:18:53 | best a actually |
---|
0:18:55 | doesn't work very well here |
---|
0:18:57 | still better to use of than nothing about |
---|
0:19:00 | but much better in this space but in any case is it can be |
---|
0:19:03 | i |
---|
0:19:04 | and the and on the best performing normalizations here would be |
---|
0:19:09 | to see and and pops to gain normalization and |
---|
0:19:12 | a out in summarization histogram equalisation |
---|
0:19:15 | a numbers behind Q C and |
---|
0:19:17 | uh that |
---|
0:19:18 | but just shows the setting for type of |
---|
0:19:20 | i a as we use if it's nine |
---|
0:19:22 | use the nine person |
---|
0:19:24 | and L and |
---|
0:19:25 | and to mount person and in Q C for used |
---|
0:19:28 | a percent than nine to six percent |
---|
0:19:30 | so for different task and data bases |
---|
0:19:32 | i actually helps to tune this |
---|
0:19:34 | ah |
---|
0:19:35 | choice of the compound |
---|
0:19:38 | a on the right side you see |
---|
0:19:40 | just pick the best performing a normal |
---|
0:19:43 | and the baseline one |
---|
0:19:45 | and compare them on the noisy |
---|
0:19:47 | recordings but the car was mixed |
---|
0:19:50 | but there is that and you see |
---|
0:19:52 | the or there |
---|
0:19:54 | i mean the ranking of the normalizations unfortunately completely makes is or a change so |
---|
0:19:59 | i didn't and and normalization that what what best every which is kind of disappointing but |
---|
0:20:04 | yeah what |
---|
0:20:06 | but this nice |
---|
0:20:06 | me me from but that if you use the newly proposed low-pass pass rasta filter |
---|
0:20:10 | a consistent lee improves the |
---|
0:20:13 | performance of the use C normalization |
---|
0:20:15 | but two new recordings and noise recordings |
---|
0:20:18 | and now we submitted paper to interspeech and |
---|
0:20:22 | but |
---|
0:20:23 | we are showing that |
---|
0:20:24 | sure that using you can see "'em" and and the you rest stuff filter |
---|
0:20:27 | it always out a performance as stuff for plp P |
---|
0:20:31 | L M F C C even if you use it in in trouble based schemes and X |
---|
0:20:35 | so |
---|
0:20:36 | yeah |
---|
0:20:37 | it seems kind of from a sink it's very simple |
---|
0:20:40 | so that's basically it what could just should be able to addition use so i'm not going to do that |
---|
0:20:45 | so |
---|
0:20:46 | and different indigent |
---|
0:20:52 | i i for just one quick question well the other speak a and it yeah |
---|
0:20:57 | huh |
---|
0:20:57 | i |
---|
0:21:07 | i |
---|
0:21:18 | right |
---|