0:00:13 | actually and um i have |
---|
0:00:15 | here to prevent our core also paper on behalf of the |
---|
0:00:18 | first of all sir from university of science and technology of china |
---|
0:00:23 | actually this work started will when the first author |
---|
0:00:27 | visited dice |
---|
0:00:28 | as an research in |
---|
0:00:30 | we |
---|
0:00:30 | implemented implement it's the harmonic plus noise model basically in the very beginning we we one the that to be |
---|
0:00:37 | used for speech or analysis is is |
---|
0:00:39 | and um |
---|
0:00:40 | especially for speech synthesis |
---|
0:00:42 | i |
---|
0:00:43 | a a she went back to school |
---|
0:00:45 | uh we are |
---|
0:00:47 | i use that |
---|
0:00:48 | harmonic plus noise model |
---|
0:00:50 | uh to to to implement a new feature for but applied this time for for speaker but verification |
---|
0:00:58 | and um |
---|
0:00:59 | we got on a price and they |
---|
0:01:02 | a a a a a promising |
---|
0:01:04 | speaker verification do a result |
---|
0:01:06 | using this that new set of features |
---|
0:01:08 | so |
---|
0:01:09 | this is basically |
---|
0:01:10 | uh i |
---|
0:01:11 | but this stuff of that the host story of this work |
---|
0:01:14 | um |
---|
0:01:15 | that a days out i i i will first to introduce our |
---|
0:01:19 | but vision |
---|
0:01:20 | and um |
---|
0:01:21 | the so called S S E a feature |
---|
0:01:24 | which stands for spectral |
---|
0:01:25 | subband and sure |
---|
0:01:27 | feature |
---|
0:01:28 | yeah i i wear up briefly introduce the harmonic class noise analysis of speech |
---|
0:01:33 | and um |
---|
0:01:34 | the you a how we and how we calculate |
---|
0:01:37 | these spectral so subband and the to your feature and finally how we model the S S yeah are feature |
---|
0:01:44 | and finally i where introduce are evaluation results and down |
---|
0:01:48 | conclusions |
---|
0:01:50 | and that is |
---|
0:01:51 | that is um probably uh we we have known problem that |
---|
0:01:55 | for today speaker i identification man the verification tasks |
---|
0:02:00 | usually we steal |
---|
0:02:01 | use |
---|
0:02:02 | the features part from automatic speech recognition |
---|
0:02:06 | um the problem is |
---|
0:02:07 | is |
---|
0:02:09 | those features are actually supposed |
---|
0:02:11 | to be able to normalize |
---|
0:02:13 | the speaker information |
---|
0:02:15 | but |
---|
0:02:15 | so |
---|
0:02:17 | we we want the motivation is quite street for word we want to |
---|
0:02:21 | find some |
---|
0:02:22 | new features |
---|
0:02:23 | that is |
---|
0:02:24 | a to re current |
---|
0:02:26 | mfcc features |
---|
0:02:28 | uh to uh uh features like |
---|
0:02:29 | like mfcc |
---|
0:02:31 | so |
---|
0:02:31 | it can |
---|
0:02:32 | uh carry |
---|
0:02:33 | the speaker characteristics and then there for |
---|
0:02:37 | uh two |
---|
0:02:38 | a a to be able to improve the speaker verification performance |
---|
0:02:42 | so |
---|
0:02:44 | this is actually a the motivation of a P work uh are of this work |
---|
0:02:49 | um |
---|
0:02:51 | that for a for there are several steps |
---|
0:02:53 | a a a a a to extract |
---|
0:02:55 | our proposed as |
---|
0:02:56 | S S yeah features |
---|
0:02:58 | at the first step is we um |
---|
0:03:00 | apply |
---|
0:03:01 | the harmonic plus |
---|
0:03:03 | noise and then it's analysis |
---|
0:03:04 | i of speech |
---|
0:03:05 | and um |
---|
0:03:06 | then |
---|
0:03:07 | calculate |
---|
0:03:08 | a subband and shows i we uh i uh we introduced a details later |
---|
0:03:13 | that's actually uh |
---|
0:03:14 | in each subband |
---|
0:03:16 | you you need to calculate |
---|
0:03:18 | the |
---|
0:03:18 | and edge of the harmonic part |
---|
0:03:20 | rows is the energy of the noise part |
---|
0:03:23 | and then |
---|
0:03:25 | that's a new feature and you plug into the current |
---|
0:03:28 | speaker verification system which is |
---|
0:03:30 | actually uh |
---|
0:03:31 | conventional gmm-ubm system |
---|
0:03:33 | and um you you use that as a uh |
---|
0:03:37 | as a |
---|
0:03:39 | as M a company uh to read fit a feature |
---|
0:03:42 | to |
---|
0:03:43 | mfcc feature |
---|
0:03:46 | so |
---|
0:03:47 | i i i will uh briefly introduce T |
---|
0:03:50 | the harmonic |
---|
0:03:51 | plus |
---|
0:03:51 | noise |
---|
0:03:52 | i i a speech and then it's is here |
---|
0:03:54 | this this work was pope |
---|
0:03:56 | proposed to by a professor |
---|
0:03:58 | start new yeah know |
---|
0:04:00 | uh |
---|
0:04:01 | you you can you can you can find the reference people are people here |
---|
0:04:04 | and um basically for for this for that |
---|
0:04:07 | for the each |
---|
0:04:09 | uh input at |
---|
0:04:10 | you we first the do |
---|
0:04:12 | uh F zero extraction of pitch extraction to get |
---|
0:04:15 | the uh uh uh uh a |
---|
0:04:17 | at F zero estimation |
---|
0:04:19 | and and uh then and and of course we you you get the you ways always label me |
---|
0:04:24 | we we we discard |
---|
0:04:26 | uh as those are waste of frames and a only in uh use those was of for frames for for |
---|
0:04:32 | further analysis |
---|
0:04:34 | and um |
---|
0:04:35 | a to this we do pitch synchronise |
---|
0:04:38 | uh synchronous |
---|
0:04:39 | window any um um on on the input utterance |
---|
0:04:42 | so you get |
---|
0:04:43 | several frame |
---|
0:04:44 | a to represent |
---|
0:04:45 | the intel uh uh uh |
---|
0:04:47 | in have syntax |
---|
0:04:48 | and um |
---|
0:04:49 | for each given frame |
---|
0:04:52 | uh our short suspects |
---|
0:04:53 | a speech segment |
---|
0:04:55 | we do |
---|
0:04:55 | a a you man and and if H M and stands for a harmonic plus noise model me |
---|
0:05:00 | model don't |
---|
0:05:02 | and um the basic idea of a you and and nine C is you |
---|
0:05:06 | um to decompose |
---|
0:05:07 | the input speech signal into the harmonic part which use |
---|
0:05:11 | a purely |
---|
0:05:12 | attic |
---|
0:05:14 | class |
---|
0:05:14 | the that noise pop |
---|
0:05:16 | we can use several mastered |
---|
0:05:18 | to to represent the noise part and the in this work we use |
---|
0:05:22 | uh uh we use the residual basically the input signal my the harmonic part |
---|
0:05:28 | uh |
---|
0:05:28 | as with noise |
---|
0:05:30 | yeah are some basic uh uh setups |
---|
0:05:32 | a a up the hmm and nine is |
---|
0:05:35 | um |
---|
0:05:36 | the speech signal as |
---|
0:05:38 | uh i i i as you can see |
---|
0:05:39 | and um we use to pitch period |
---|
0:05:43 | a hamming window for each track twos to uh to get the uh |
---|
0:05:48 | to to to basically chop |
---|
0:05:50 | the input |
---|
0:05:51 | include a a speech |
---|
0:05:52 | and um he is a not and that and that was another important thing we need to define that is |
---|
0:05:58 | a a for for for for each you man |
---|
0:06:00 | which H can an an it's is that is in max |
---|
0:06:03 | max small was the frequency |
---|
0:06:05 | uh |
---|
0:06:06 | uh we we fix |
---|
0:06:07 | that frequency to six er |
---|
0:06:10 | and um the as a as a as i mentioned before |
---|
0:06:13 | the noise noise part |
---|
0:06:15 | a a a a a is defined as a research researchers it signal |
---|
0:06:21 | and then yes uh example |
---|
0:06:24 | all all this the the same role you're |
---|
0:06:27 | uh i |
---|
0:06:28 | a a pronounced up by different two different speakers |
---|
0:06:31 | uh |
---|
0:06:32 | the the group |
---|
0:06:33 | the red curve is the uh uh uh |
---|
0:06:36 | harmonic part harmonic spectrum |
---|
0:06:38 | uh |
---|
0:06:39 | i of a particular input frame |
---|
0:06:41 | and it and the and the green power growing curve is |
---|
0:06:44 | the noise part of that's the spectral for |
---|
0:06:46 | spectrogram up the not a noise part |
---|
0:06:49 | and um |
---|
0:06:50 | for this |
---|
0:06:52 | frequency subband as you can see |
---|
0:06:54 | a a for this speaker |
---|
0:06:56 | the it's a and the tree show of the harmonic part and the noise part is almost a uh |
---|
0:07:02 | almost most like a bit uh at the wine |
---|
0:07:05 | basically it means |
---|
0:07:07 | uh the energy of the harmonic money part |
---|
0:07:09 | a similar |
---|
0:07:10 | is similar to the to the energy of the noise part |
---|
0:07:13 | a for this speaker |
---|
0:07:14 | you can see more energy is |
---|
0:07:16 | this are designed to the harmonic part about the and |
---|
0:07:20 | the noise part |
---|
0:07:21 | so |
---|
0:07:22 | we we hope these characteristics or or or read uh |
---|
0:07:26 | a this kind of K |
---|
0:07:27 | a a re and |
---|
0:07:29 | differentiate |
---|
0:07:30 | a a a a a a different speakers |
---|
0:07:33 | so |
---|
0:07:35 | we when we observe this |
---|
0:07:37 | the a |
---|
0:07:38 | the sixty two |
---|
0:07:39 | problems you and you need to you right find the first is |
---|
0:07:43 | a band |
---|
0:07:44 | a of each sub-band |
---|
0:07:46 | a in this case we we define depend where is that it as the average of to |
---|
0:07:51 | mean them uh possible F zero |
---|
0:07:53 | and the maximal possible F zero four |
---|
0:07:56 | for a a for a given speaker |
---|
0:07:58 | actually i this this |
---|
0:08:00 | uh this number |
---|
0:08:01 | those two numbers up a gender dependent |
---|
0:08:04 | we defined |
---|
0:08:05 | a a a a i a i of |
---|
0:08:07 | values use for female speakers and this |
---|
0:08:10 | another set of barry was for for male speakers |
---|
0:08:13 | and done and the problem is is centre |
---|
0:08:16 | center frequency |
---|
0:08:17 | of of each subband |
---|
0:08:19 | um |
---|
0:08:19 | actually at this is quite straightforward uh four or H M and analysis |
---|
0:08:24 | we use |
---|
0:08:25 | three |
---|
0:08:26 | yeah and grew or multi part might might pose are of F zero |
---|
0:08:30 | and um so that we can define |
---|
0:08:33 | T |
---|
0:08:34 | i subbands together uh in total to cover |
---|
0:08:38 | they whole frequency range |
---|
0:08:40 | and um |
---|
0:08:42 | after this |
---|
0:08:45 | oh |
---|
0:08:46 | which start and and and in frequency |
---|
0:08:49 | we can calculate |
---|
0:08:50 | the subband energy for the H T |
---|
0:08:54 | mainly at the harmonic part |
---|
0:08:56 | and |
---|
0:08:57 | at the end sub energy for for the noise part |
---|
0:09:00 | and done |
---|
0:09:01 | you can calculate then calculate |
---|
0:09:03 | the energy ratio between the two and come vote |
---|
0:09:07 | the value |
---|
0:09:08 | into into T B |
---|
0:09:09 | so |
---|
0:09:11 | so |
---|
0:09:12 | after this |
---|
0:09:13 | for each frame you you get a dimensional feature vector |
---|
0:09:17 | and um |
---|
0:09:18 | and uh the this is |
---|
0:09:19 | a gender dependent |
---|
0:09:21 | so |
---|
0:09:22 | in in in in our experiments |
---|
0:09:24 | uh |
---|
0:09:25 | for female speakers there are a sort is three |
---|
0:09:28 | uh |
---|
0:09:29 | i mentioned dimension |
---|
0:09:31 | uh uh uh we have a |
---|
0:09:33 | is three dimensional feature for female speakers |
---|
0:09:36 | and uh |
---|
0:09:37 | forty five |
---|
0:09:38 | dimensional |
---|
0:09:39 | feature for male speakers because |
---|
0:09:41 | of male speakers |
---|
0:09:43 | usually have a |
---|
0:09:44 | lower or uh i F zero |
---|
0:09:46 | so |
---|
0:09:48 | so |
---|
0:09:49 | after the |
---|
0:09:50 | the the feature has been a are calculated |
---|
0:09:53 | uh we need to you |
---|
0:09:55 | not of that |
---|
0:09:56 | uh |
---|
0:09:56 | so we the first thing we we want to check is whether |
---|
0:10:00 | the distribution of the S S yeah features is in so that we can use |
---|
0:10:05 | uh jim and |
---|
0:10:06 | to to model that |
---|
0:10:08 | so |
---|
0:10:08 | actually we we we caff the we |
---|
0:10:10 | we we plot those |
---|
0:10:12 | a a one to see whether we can we can model that and they so like |
---|
0:10:16 | using using |
---|
0:10:18 | come distribution to model that feature |
---|
0:10:20 | a a uh is quite reasonable and they it looks like a option |
---|
0:10:25 | so |
---|
0:10:27 | we use the are come at covers note gmm-ubm ubm system |
---|
0:10:31 | to do speaker verification |
---|
0:10:34 | we use |
---|
0:10:35 | uh |
---|
0:10:36 | conventional mfcc feature |
---|
0:10:38 | as a baseline |
---|
0:10:39 | and um |
---|
0:10:40 | implement the S S yeah a feature based |
---|
0:10:43 | system |
---|
0:10:44 | and um |
---|
0:10:46 | we use um mentoring data name the |
---|
0:10:49 | uh it's |
---|
0:10:50 | six is this is um come the used a a a a database |
---|
0:10:54 | uh uh mentoring and them |
---|
0:10:56 | i |
---|
0:10:57 | it is widely used a in china i i i two |
---|
0:11:00 | for some for speech recognition and that speech and then it's is even for speaker a speaker related task |
---|
0:11:07 | and the we measure the |
---|
0:11:09 | the eer |
---|
0:11:10 | uh to to say |
---|
0:11:12 | to |
---|
0:11:13 | um |
---|
0:11:14 | as a pro for a a as a as a of performance match |
---|
0:11:17 | and uh |
---|
0:11:18 | no score number the that |
---|
0:11:19 | normalization was used |
---|
0:11:21 | uh this is say that these some statistics |
---|
0:11:24 | have C up to a training and test and couples |
---|
0:11:26 | are we have |
---|
0:11:27 | oh a hundred and D five |
---|
0:11:29 | speakers altogether together |
---|
0:11:31 | and um |
---|
0:11:32 | we use |
---|
0:11:33 | and seconds training and in seconds |
---|
0:11:36 | a test |
---|
0:11:37 | i |
---|
0:11:38 | oh for speaker verification task |
---|
0:11:40 | and then |
---|
0:11:41 | so it this is a reading style us |
---|
0:11:43 | and um |
---|
0:11:45 | so |
---|
0:11:47 | those sickly we have |
---|
0:11:48 | for for the for the two speakers we have |
---|
0:11:51 | uh a a a a seventy |
---|
0:11:52 | some T six male speakers |
---|
0:11:54 | last |
---|
0:11:55 | a you four female speakers |
---|
0:11:57 | and uh we have |
---|
0:11:58 | a to to there we have um |
---|
0:12:00 | by this number of of |
---|
0:12:03 | a unique testing sentences |
---|
0:12:05 | and we |
---|
0:12:06 | we we |
---|
0:12:06 | we we we are range them to get |
---|
0:12:09 | this number of |
---|
0:12:11 | uh |
---|
0:12:11 | uh |
---|
0:12:12 | i six |
---|
0:12:13 | six thousand male trials |
---|
0:12:15 | process |
---|
0:12:16 | seven sound and female trials |
---|
0:12:19 | okay |
---|
0:12:19 | that's is see that the result |
---|
0:12:22 | uh |
---|
0:12:23 | we can first uh see the uh |
---|
0:12:26 | and F C C baseline the ye E for the for the M and mfcc baseline |
---|
0:12:30 | and um |
---|
0:12:32 | i i as as as you can observe of and as user |
---|
0:12:35 | uh of the female speakers |
---|
0:12:38 | i have to be to be more a little bit |
---|
0:12:41 | more difficult to handle |
---|
0:12:42 | and um |
---|
0:12:43 | by using as S yeah are features are alarm |
---|
0:12:47 | uh is that the performance is actually worse |
---|
0:12:50 | then the mfcc features |
---|
0:12:52 | that |
---|
0:12:53 | you can get those numbers |
---|
0:12:55 | at |
---|
0:12:56 | if you if we combine those two systems together |
---|
0:12:59 | we we get an |
---|
0:13:01 | the uh we get a a a a a a reasonable input bit uh performance improvement |
---|
0:13:05 | especially |
---|
0:13:06 | for the for the female speaker female speakers |
---|
0:13:10 | so if we can combine those two system together actually |
---|
0:13:14 | uh that |
---|
0:13:14 | the female up |
---|
0:13:16 | the performance for the for |
---|
0:13:17 | for the a female speakers here is actually |
---|
0:13:21 | becomes |
---|
0:13:21 | factor |
---|
0:13:22 | then the then the or speak |
---|
0:13:24 | uh so |
---|
0:13:26 | this is a a |
---|
0:13:27 | quite |
---|
0:13:28 | interesting and surprisingly good |
---|
0:13:31 | uh performance improves |
---|
0:13:35 | so |
---|
0:13:36 | to conclude |
---|
0:13:37 | this is |
---|
0:13:38 | this this paper actually |
---|
0:13:40 | a it is quite straightforward we we we |
---|
0:13:43 | proposed are you new |
---|
0:13:45 | feature named as as yeah |
---|
0:13:47 | for speaker verification |
---|
0:13:49 | it can uh |
---|
0:13:51 | characterise |
---|
0:13:52 | three interaction between vocal tract movements and |
---|
0:13:55 | close to L for |
---|
0:13:56 | and uh |
---|
0:13:57 | seems like it it is |
---|
0:13:59 | quite quite the to |
---|
0:14:01 | capture the speaker |
---|
0:14:02 | characteristics |
---|
0:14:04 | and um |
---|
0:14:05 | this feature is |
---|
0:14:07 | complementary to mfcc |
---|
0:14:09 | and um |
---|
0:14:10 | in if you read you you you read it there uh in reducing yeah a along with the mfcc baseline |
---|
0:14:16 | system |
---|
0:14:17 | and um |
---|
0:14:18 | of the future work we want to |
---|
0:14:20 | a |
---|
0:14:22 | to see |
---|
0:14:22 | to to do more experiment to see whether it performs well |
---|
0:14:26 | for example |
---|
0:14:27 | in noisy environment |
---|
0:14:29 | and um and that and after post processing |
---|
0:14:32 | techniques |
---|
0:14:33 | okay thank you very much |
---|
0:14:40 | i you you through the question the |
---|
0:14:42 | yeah |
---|
0:14:43 | yeah |
---|
0:14:44 | hopefully i can i i because uh i i'm not quite for a mill it ways the uh speaker verification |
---|
0:14:49 | task |
---|
0:14:50 | this what was was basically |
---|
0:14:52 | at time when uh when she went back to school |
---|
0:14:55 | uh we |
---|
0:14:56 | uh the the the the the |
---|
0:14:59 | the come part of of this work it that the the intent actually implements the G and three is it's |
---|
0:15:05 | when going as you with it it is |
---|
0:15:07 | so hopefully i can i can i can |
---|
0:15:09 | i can answer your question more focus on the it it it H and part |
---|
0:15:14 | uh |
---|
0:15:15 | and hopefully my a the can states not so the details uh a |
---|
0:15:19 | so the uh your is you feature is the |
---|
0:15:22 | but is only calculates the racial of the harmonic and the noise parts |
---|
0:15:25 | yeah so the weight does noise come from |
---|
0:15:28 | ah |
---|
0:15:29 | the the noise is is that so car noise it it different from the uh from the additive noise all |
---|
0:15:34 | the noise |
---|
0:15:35 | no the environment so issue in in speech recognition |
---|
0:15:38 | that's is is is this is different |
---|
0:15:40 | a basically |
---|
0:15:42 | uh |
---|
0:15:43 | the second use |
---|
0:15:44 | this speaker |
---|
0:15:51 | that |
---|
0:15:52 | that |
---|
0:15:53 | for the H M and analysis part |
---|
0:15:56 | uh for each even an input speech frame |
---|
0:15:59 | you will decompose |
---|
0:16:01 | the |
---|
0:16:03 | the inputs |
---|
0:16:03 | speech signal into two different parts |
---|
0:16:06 | the first a part is called how money part which is purely |
---|
0:16:10 | a pure rhetoric |
---|
0:16:11 | and and and the remaining saying |
---|
0:16:13 | the residual you can't define that as a as it noise |
---|
0:16:16 | so this noise is different front fronts the noise in |
---|
0:16:20 | in in in speech for example speech recognition |
---|
0:16:23 | oh so uh it's like a in your system you are using clean signals the recorded in the parts of |
---|
0:16:28 | is yes the so you can uh like extract the uh |
---|
0:16:33 | and not the the you the you as you call the not that's are defined everything that is not a |
---|
0:16:38 | period it okay so that so as noise i is if there was like a a same noise in the |
---|
0:16:44 | uh in in the region signal is |
---|
0:16:46 | with will this noise |
---|
0:16:48 | noise part be robust to the it is to have a that and that is a problem we want we |
---|
0:16:52 | want to see |
---|
0:16:53 | that a a uh in this uh if if the input signal is noisy |
---|
0:16:58 | that there are several things that could be effect it's by by the uh uh a that it i mean |
---|
0:17:03 | the additive noise |
---|
0:17:04 | for example what the |
---|
0:17:06 | it it the the it and analysis still rely pretty much on the |
---|
0:17:11 | i'm the actor rates |
---|
0:17:12 | estimate of the F zero |
---|
0:17:14 | yeah |
---|
0:17:15 | but if you have very strong noise |
---|
0:17:17 | this part could be affected |
---|
0:17:19 | and then |
---|
0:17:20 | uh |
---|
0:17:22 | "'cause" see that the the a still bit it depending on the on hype |
---|
0:17:26 | a pure noise |
---|
0:17:27 | it could affect you how money estimation and it it could also a fact |
---|
0:17:32 | the noise estimation |
---|
0:17:34 | yeah |
---|
0:17:34 | but that that the is actually based thing |
---|
0:17:36 | the that the the the first |
---|
0:17:38 | a or yeah why wound it to to investigate |
---|
0:17:42 | as a as a future work |
---|
0:17:43 | okay in also i'd like to ask a what is the mess used to fuse the mfcc C and S |
---|
0:17:48 | the cr systems |
---|
0:17:50 | that's |
---|
0:17:50 | uh the this is |
---|
0:17:51 | score fusion because |
---|
0:17:53 | a a you know and F C C for the mfcc system you use all the all this input frames |
---|
0:17:59 | i |
---|
0:17:59 | for the edge H an and H and and |
---|
0:18:03 | a the S S yeah system |
---|
0:18:04 | um was the frames are discarded |
---|
0:18:07 | so |
---|
0:18:08 | we are |
---|
0:18:09 | the so so the the |
---|
0:18:11 | the you you uh the |
---|
0:18:13 | you you but this call a call uh to get um frame average |
---|
0:18:18 | like a good reaches call and then combine them to get in in you and the uh is is is |
---|
0:18:23 | that shouldn't coefficients of the |
---|
0:18:26 | so the individual features |
---|
0:18:28 | the uh |
---|
0:18:30 | i |
---|
0:18:30 | i can not a i don't have a a answer to this question maybe we we need to check with |
---|
0:18:36 | the we we we is yeah why i'm that |
---|
0:18:38 | okay yeah you think i i don't know whether there is a uh any weight or if you if it |
---|
0:18:42 | is |
---|
0:18:43 | a a critical to can those weight |
---|
0:18:45 | yeah |
---|
0:18:46 | okay |
---|
0:18:47 | oh of of the question |
---|
0:18:50 | you do anything with the residual for |
---|
0:18:53 | uh you use that all somehow as a feature |
---|
0:18:56 | now we we don't use that we only calculate the in to racial look between the how money part was |
---|
0:19:02 | is C |
---|
0:19:02 | the research |
---|
0:19:03 | yeah |
---|
0:19:06 | can also a question um is |
---|
0:19:09 | then |
---|
0:19:10 | addition shouldn't of these features to do mfcc improves the the results means that so to these two feature sets |
---|
0:19:17 | of features are uncorrelated |
---|
0:19:19 | so |
---|
0:19:19 | did |
---|
0:19:20 | to much will be pretty did the judgement to |
---|
0:19:24 | to what it's and it or it's just it works better room |
---|
0:19:27 | so the a subset of the speakers that you to is |
---|
0:19:31 | a a possible slogan |
---|
0:19:34 | may be and i'm not so sure about this part and done |
---|
0:19:38 | and the we we did we didn't try in for example you can |
---|
0:19:43 | if |
---|
0:19:44 | the the first thing is we we have discarded the on voiced of frames |
---|
0:19:48 | so |
---|
0:19:48 | so basic you you cannot not |
---|
0:19:51 | a a a a two i E like pca to to to come work for example combine mfcc yeah a |
---|
0:19:57 | long we see as as yeah feature and then map |
---|
0:20:00 | that's same to a two |
---|
0:20:01 | and the uh read used in the that animation two |
---|
0:20:04 | to play the news system would we we haven't tried that because |
---|
0:20:08 | um |
---|
0:20:10 | because you have difficulties to hand with the with the on ways the a frames |
---|
0:20:14 | i in in in |
---|
0:20:16 | conventional no out |
---|
0:20:18 | uh how money plus noise and non is is |
---|
0:20:21 | have a for always of frames you do not have the estimate of of of the harmonic part |
---|
0:20:27 | so |
---|
0:20:28 | basically city we cannot calculate |
---|
0:20:30 | the subband and three joe for the always stuff |
---|
0:20:33 | second |
---|
0:20:36 | but |
---|
0:20:38 | have |
---|
0:20:40 | a |
---|
0:20:42 | i |
---|
0:20:44 | i |
---|
0:20:45 | oh |
---|
0:20:47 | ah |
---|
0:20:47 | sh |
---|
0:20:50 | a |
---|
0:20:51 | yeah |
---|
0:20:55 | i |
---|
0:20:55 | i |
---|
0:20:57 | but |
---|
0:20:58 | i |
---|
0:21:00 | yeah |
---|
0:21:01 | oh |
---|
0:21:03 | i |
---|
0:21:04 | i |
---|
0:21:09 | i |
---|
0:21:09 | a |
---|
0:21:10 | i |
---|
0:21:11 | i |
---|
0:21:14 | oh |
---|
0:21:16 | i |
---|
0:21:17 | i |
---|
0:21:18 | i |
---|
0:21:20 | i |
---|
0:21:21 | i |
---|
0:21:21 | yeah |
---|
0:21:25 | yeah |
---|
0:21:26 | a |
---|
0:21:28 | a |
---|
0:21:29 | a |
---|
0:21:32 | i |
---|
0:21:34 | a |
---|
0:21:37 | yeah |
---|
0:21:38 | yeah |
---|
0:21:39 | i |
---|
0:21:40 | oh |
---|
0:21:42 | i |
---|
0:21:43 | a |
---|
0:21:44 | i |
---|
0:21:45 | oh |
---|
0:21:46 | a |
---|
0:21:48 | i |
---|
0:21:53 | i |
---|
0:21:53 | oh |
---|
0:21:55 | i |
---|
0:21:59 | i |
---|
0:22:00 | i |
---|
0:22:03 | a |
---|
0:22:04 | i |
---|
0:22:09 | i |
---|
0:22:11 | yeah |
---|
0:22:12 | yeah |
---|
0:22:13 | i |
---|
0:22:16 | i |
---|
0:22:17 | a |
---|
0:22:22 | okay |
---|
0:22:23 | thanks so let's things the speech were again and |
---|
0:22:26 | i |
---|
0:22:28 | yeah |
---|