0:00:07 | our idea |
---|
0:00:08 | i am |
---|
0:00:08 | writing saving |
---|
0:00:10 | uh |
---|
0:00:11 | what about um |
---|
0:00:12 | research uh directions that uh we are working in university of is there a lot |
---|
0:00:17 | is concentrated on uh |
---|
0:00:19 | uh new features for |
---|
0:00:21 | uh speaker recognition |
---|
0:00:23 | uh |
---|
0:00:23 | it could be i have a |
---|
0:00:25 | line |
---|
0:00:25 | three in our laboratory that we are working on um |
---|
0:00:28 | oh |
---|
0:00:29 | let me say |
---|
0:00:29 | i'm a little meeting on and working on it |
---|
0:00:32 | and new features exploring area of new features for |
---|
0:00:35 | the speaker recognition |
---|
0:00:37 | now the current work is the application of uh |
---|
0:00:41 | some sort of features name uh |
---|
0:00:44 | uh weighted linear prediction features for |
---|
0:00:46 | the speaker recognition |
---|
0:00:48 | and uh our work |
---|
0:00:50 | is |
---|
0:00:50 | yeah done jointly bar |
---|
0:00:52 | a group of how holding university of helsinki that nowadays the following a |
---|
0:00:57 | all all the universe |
---|
0:00:59 | let me just say that |
---|
0:01:01 | uh i'm not pretending that i know |
---|
0:01:03 | what is happening inside the |
---|
0:01:06 | uh is a weighted linear prediction i'm just |
---|
0:01:09 | presenting on this up my understanding that |
---|
0:01:12 | what is weighted linear prediction |
---|
0:01:15 | and |
---|
0:01:15 | uh from the group of older is here i just have to me |
---|
0:01:19 | to help me if |
---|
0:01:20 | i cannot describe something to you |
---|
0:01:23 | then |
---|
0:01:24 | so the concept |
---|
0:01:26 | we have |
---|
0:01:27 | sometimes the the customers or reckons that the users of |
---|
0:01:30 | speaker recognition technology that |
---|
0:01:32 | they want to use |
---|
0:01:33 | speaker recognition when they are when when they are |
---|
0:01:36 | outside of |
---|
0:01:37 | environment |
---|
0:01:38 | but we also have some sort of other users of a speaker recognition technology that they want to use it |
---|
0:01:44 | in any environment in high energy |
---|
0:01:46 | noise environment in the street |
---|
0:01:48 | or like |
---|
0:01:48 | fig track |
---|
0:01:49 | noise whatever |
---|
0:01:50 | they are not in office environment |
---|
0:01:52 | or the way to control |
---|
0:01:54 | uh and wonderful |
---|
0:01:55 | speech record |
---|
0:01:56 | then we are interested you know |
---|
0:01:58 | how our speaker recognition systems |
---|
0:02:00 | could it |
---|
0:02:01 | the degrade in performance |
---|
0:02:03 | but having this type of |
---|
0:02:04 | additive noise |
---|
0:02:08 | well |
---|
0:02:08 | just to describe what is our for |
---|
0:02:11 | since record here is the |
---|
0:02:13 | we are uh i think that uh |
---|
0:02:15 | uh a typical speaker recognition system but different |
---|
0:02:18 | phases and different modules |
---|
0:02:20 | but our for |
---|
0:02:22 | in this society is to |
---|
0:02:23 | see that |
---|
0:02:24 | how feature extraction |
---|
0:02:25 | could affect |
---|
0:02:26 | speaker recognition |
---|
0:02:28 | all of the speaker recognition performance |
---|
0:02:32 | how typically we are being uh |
---|
0:02:34 | feature extraction |
---|
0:02:36 | is that |
---|
0:02:36 | we have window frames |
---|
0:02:38 | we do is it from estimation |
---|
0:02:40 | having the mfccs duress the |
---|
0:02:42 | filtering |
---|
0:02:43 | appending delta double that of |
---|
0:02:45 | frame dropping according to energy |
---|
0:02:47 | and uh cepstral mean and variance normalisation this is something typical weekly |
---|
0:02:52 | thirty six dimensional feature vector that we have in our experiments but |
---|
0:02:56 | uh this is just based on problems |
---|
0:02:57 | is that we have |
---|
0:02:59 | then |
---|
0:03:00 | now the question is that |
---|
0:03:01 | is really |
---|
0:03:03 | uh we are all the time using F |
---|
0:03:05 | p2p |
---|
0:03:05 | to make the spectrum but if it is really |
---|
0:03:08 | uh the best way |
---|
0:03:09 | that we can do it |
---|
0:03:10 | or |
---|
0:03:11 | another question is that |
---|
0:03:12 | is it really that much for robust in additive noise condition |
---|
0:03:18 | that are going to the L P |
---|
0:03:19 | it is |
---|
0:03:20 | something uh |
---|
0:03:21 | well known that |
---|
0:03:22 | uh estimating the spectrum could be done by |
---|
0:03:25 | linear prediction |
---|
0:03:27 | or if if the estimation |
---|
0:03:28 | and they are uh |
---|
0:03:30 | fig just alternate model alternate the way to estimate the spectrum |
---|
0:03:34 | nobody save uh that |
---|
0:03:36 | L P is better for speaker recognition or if |
---|
0:03:39 | debaters metaphor |
---|
0:03:40 | the speaker recognition or even any |
---|
0:03:42 | other |
---|
0:03:42 | the speech processing applications |
---|
0:03:45 | and that |
---|
0:03:47 | now |
---|
0:03:47 | we are trying |
---|
0:03:48 | the two |
---|
0:03:49 | uh say that |
---|
0:03:51 | what is the performance of fft |
---|
0:03:54 | L P |
---|
0:03:54 | and now introducing we L P |
---|
0:03:57 | the V L P |
---|
0:03:58 | it's uh |
---|
0:03:59 | just |
---|
0:04:00 | targeted to pay more stress |
---|
0:04:03 | and |
---|
0:04:03 | some regions that |
---|
0:04:04 | speech that |
---|
0:04:05 | uh do you have |
---|
0:04:07 | let me say uh they have |
---|
0:04:09 | more energy |
---|
0:04:10 | yeah we have |
---|
0:04:12 | uh |
---|
0:04:13 | uh the way the uh we are waiting there |
---|
0:04:15 | energy of error |
---|
0:04:16 | by a weighting function and where the weighting function comes from |
---|
0:04:20 | is that we are uh |
---|
0:04:22 | computing there |
---|
0:04:28 | yeah we are |
---|
0:04:30 | we are computing the |
---|
0:04:32 | uh the weighting function as that |
---|
0:04:34 | the immediate energy of the signal |
---|
0:04:37 | before that |
---|
0:04:38 | current sample something like in samples before the current sample |
---|
0:04:41 | and put it and |
---|
0:04:43 | weighting function |
---|
0:04:44 | where we are estimating interrupted |
---|
0:04:46 | for example based on the previous at |
---|
0:04:49 | in this way |
---|
0:04:50 | it's possible |
---|
0:04:52 | yeah again set the derivatives of that |
---|
0:04:54 | wait |
---|
0:04:54 | echo or uh with respect |
---|
0:04:56 | yeah |
---|
0:04:57 | estimator |
---|
0:04:57 | a chance to zero and |
---|
0:04:59 | at least two normal double curve |
---|
0:05:01 | uh decorations |
---|
0:05:02 | and fine |
---|
0:05:03 | the weights |
---|
0:05:04 | after predictor |
---|
0:05:05 | and uh |
---|
0:05:06 | it is |
---|
0:05:07 | maybe the history count tonight |
---|
0:05:09 | seventy five and after that |
---|
0:05:11 | again activated in nineteen ninety three |
---|
0:05:13 | that the weighted linear prediction |
---|
0:05:16 | but |
---|
0:05:17 | let's say |
---|
0:05:18 | why |
---|
0:05:19 | we are choosing the S E |
---|
0:05:21 | short time energy |
---|
0:05:22 | four weighting function of the V L |
---|
0:05:26 | it can be true that |
---|
0:05:27 | yeah regions |
---|
0:05:28 | speech that they are they have high energy |
---|
0:05:31 | they are less contaminated with additive noise |
---|
0:05:34 | and uh |
---|
0:05:35 | it is a |
---|
0:05:36 | something some uh some sort of |
---|
0:05:38 | five |
---|
0:05:39 | but it is known we can have |
---|
0:05:42 | it or estimation of the spectrum in the region that |
---|
0:05:44 | speech that they are |
---|
0:05:45 | less |
---|
0:05:46 | corrupted by noise |
---|
0:05:47 | and these regions that |
---|
0:05:48 | speech |
---|
0:05:49 | how |
---|
0:05:49 | higher |
---|
0:05:50 | short time |
---|
0:05:51 | energy |
---|
0:05:53 | it corresponds also |
---|
0:05:55 | to the region of the i mean |
---|
0:05:57 | when you're talking the regions of a speech that they are |
---|
0:05:59 | higher |
---|
0:06:00 | short time energy |
---|
0:06:02 | it also corresponds to the regions |
---|
0:06:04 | that |
---|
0:06:05 | uh our |
---|
0:06:06 | little hole |
---|
0:06:07 | a little |
---|
0:06:08 | and the |
---|
0:06:09 | yeah |
---|
0:06:09 | some local system it disconnected |
---|
0:06:11 | from this the speech production |
---|
0:06:13 | system |
---|
0:06:14 | and the |
---|
0:06:14 | in this case we have some standing wave inside our local calls |
---|
0:06:19 | where |
---|
0:06:19 | if we want to compute |
---|
0:06:21 | formance of |
---|
0:06:22 | speech signal |
---|
0:06:23 | we can have more prominent |
---|
0:06:25 | uh formant |
---|
0:06:26 | estimation |
---|
0:06:27 | of that |
---|
0:06:28 | speech signal |
---|
0:06:31 | well |
---|
0:06:32 | if |
---|
0:06:32 | now what is the problem with reality |
---|
0:06:35 | normal equation somehow gravity to lead |
---|
0:06:38 | two |
---|
0:06:39 | table filter when we are |
---|
0:06:41 | predicting the coefficients of the predictor |
---|
0:06:44 | now the problem with the L P that it is that correctly |
---|
0:06:46 | sure |
---|
0:06:47 | to lead to stable filter |
---|
0:06:49 | and this is a problem |
---|
0:06:50 | speech thing |
---|
0:06:50 | as for example |
---|
0:06:52 | oh how we can |
---|
0:06:53 | what we can do |
---|
0:06:55 | is that uh |
---|
0:06:57 | instead of using |
---|
0:06:58 | some sort of |
---|
0:06:59 | weighting function |
---|
0:07:00 | we can decompose into partial weights |
---|
0:07:02 | and a light |
---|
0:07:04 | in |
---|
0:07:04 | this way |
---|
0:07:05 | to the estimator |
---|
0:07:06 | after uh |
---|
0:07:08 | yeah |
---|
0:07:09 | current sample |
---|
0:07:10 | and |
---|
0:07:10 | in this way |
---|
0:07:11 | we can only |
---|
0:07:12 | to such equations |
---|
0:07:14 | that they are derived |
---|
0:07:15 | in the paper up to maggie |
---|
0:07:17 | and uh |
---|
0:07:19 | uh |
---|
0:07:20 | they describe |
---|
0:07:21 | the behaviour of the |
---|
0:07:23 | a total weight |
---|
0:07:24 | i mean these base |
---|
0:07:26 | in the way |
---|
0:07:27 | that the |
---|
0:07:28 | final estimator coefficients should be |
---|
0:07:31 | it should be in such a way that lead to the |
---|
0:07:34 | a stable filter |
---|
0:07:36 | well |
---|
0:07:37 | i'm not |
---|
0:07:38 | still |
---|
0:07:38 | understanding completely what's happening here but in this paper |
---|
0:07:42 | because we describe describe |
---|
0:07:44 | but for more different |
---|
0:07:46 | please |
---|
0:07:46 | you can refer to that |
---|
0:07:48 | paper |
---|
0:07:50 | well here |
---|
0:07:51 | i'm the reading of |
---|
0:07:52 | frame and |
---|
0:07:53 | i spectrum estimation of it |
---|
0:07:56 | voice |
---|
0:07:56 | right |
---|
0:07:57 | from these two thousand |
---|
0:07:59 | to uh sorry |
---|
0:08:01 | and the |
---|
0:08:02 | uh somehow |
---|
0:08:04 | the same frame |
---|
0:08:05 | that we contaminated with factory noise |
---|
0:08:07 | with your db snr |
---|
0:08:09 | it is |
---|
0:08:11 | let me think obvious that |
---|
0:08:12 | uh |
---|
0:08:13 | uh |
---|
0:08:14 | when we are doing the the |
---|
0:08:16 | uh spectrum estimation of the noise to signal |
---|
0:08:18 | there are |
---|
0:08:19 | some problems |
---|
0:08:20 | that |
---|
0:08:21 | it |
---|
0:08:21 | is mainly cool |
---|
0:08:22 | by |
---|
0:08:23 | the the the |
---|
0:08:25 | the noise signal and |
---|
0:08:26 | how it affects |
---|
0:08:27 | depends on the snr level it depends on the noise that is adjusted |
---|
0:08:31 | sample |
---|
0:08:32 | and the tequila just more intuition what is |
---|
0:08:36 | zero T V factory noise i have here |
---|
0:08:38 | yeah |
---|
0:08:39 | speech file just the |
---|
0:08:40 | P stuff |
---|
0:08:41 | speech files that |
---|
0:08:42 | we do all this |
---|
0:08:43 | frame |
---|
0:08:43 | from those people |
---|
0:08:44 | speech file |
---|
0:08:49 | a little |
---|
0:08:51 | it'll the other way |
---|
0:08:54 | we go real but i don't know what or something |
---|
0:08:58 | yeah it was a clean sample from these two thousand |
---|
0:09:01 | you |
---|
0:09:01 | test set |
---|
0:09:03 | yeah |
---|
0:09:04 | yeah |
---|
0:09:05 | yeah |
---|
0:09:05 | yeah |
---|
0:09:06 | the other way |
---|
0:09:07 | the remote |
---|
0:09:09 | really |
---|
0:09:10 | or |
---|
0:09:11 | yeah |
---|
0:09:12 | and |
---|
0:09:13 | same piece |
---|
0:09:13 | that we can can it be zero T V |
---|
0:09:16 | additive noise |
---|
0:09:17 | well factor |
---|
0:09:19 | well |
---|
0:09:20 | no it shows that |
---|
0:09:21 | what are what is really |
---|
0:09:23 | the mean by zero D B |
---|
0:09:25 | snr |
---|
0:09:26 | yeah |
---|
0:09:27 | yeah |
---|
0:09:28 | yeah |
---|
0:09:28 | yeah |
---|
0:09:30 | connected to some results |
---|
0:09:31 | ah |
---|
0:09:32 | yeah |
---|
0:09:33 | let me think uh opted for |
---|
0:09:35 | spectrum estimation method |
---|
0:09:37 | that we are thinking about |
---|
0:09:38 | and used to come into |
---|
0:09:40 | corpus we had known or has some other type of |
---|
0:09:43 | speaker detection |
---|
0:09:44 | and using factory noise then |
---|
0:09:46 | the only be |
---|
0:09:48 | snr |
---|
0:09:49 | here we can see that |
---|
0:09:50 | the method mainly grouped into |
---|
0:09:53 | sure method |
---|
0:09:54 | after |
---|
0:09:54 | the N L P |
---|
0:09:55 | and let me see the weighted |
---|
0:09:57 | L P group |
---|
0:09:59 | plp itself |
---|
0:09:59 | and |
---|
0:10:00 | it's the L P |
---|
0:10:02 | yeah |
---|
0:10:03 | i i should mention that needs to go into |
---|
0:10:05 | it's a |
---|
0:10:07 | uh the database collected in uh |
---|
0:10:10 | um |
---|
0:10:11 | uh |
---|
0:10:12 | that |
---|
0:10:13 | mobile handsets mainly |
---|
0:10:15 | and it includes |
---|
0:10:16 | inside |
---|
0:10:17 | come with uh convolutional noise and some additive noise |
---|
0:10:20 | although we are i think i did too much white |
---|
0:10:23 | ourselves |
---|
0:10:25 | yeah |
---|
0:10:26 | we can |
---|
0:10:27 | see |
---|
0:10:27 | that that is really some difference between the performance of |
---|
0:10:31 | these feature |
---|
0:10:32 | in additive noise environment |
---|
0:10:36 | we don't try |
---|
0:10:37 | uh some |
---|
0:10:38 | just |
---|
0:10:38 | let me say one |
---|
0:10:39 | very famous |
---|
0:10:40 | a speech enhancement method |
---|
0:10:42 | and uh |
---|
0:10:43 | uh as it |
---|
0:10:44 | just some added to black |
---|
0:10:46 | in our feature extraction |
---|
0:10:47 | to see what |
---|
0:10:48 | really uh one simplicity |
---|
0:10:50 | speech enhanced |
---|
0:10:51 | method |
---|
0:10:52 | i have |
---|
0:10:53 | a speaker recognition system in additive noise N Y |
---|
0:10:56 | and |
---|
0:10:57 | looking at the results |
---|
0:10:58 | it shows that yes there is |
---|
0:11:00 | uh some |
---|
0:11:01 | good improvement |
---|
0:11:03 | based on |
---|
0:11:04 | having a speech |
---|
0:11:06 | and enhancement or latency spectrum |
---|
0:11:08 | yeah subtracting our |
---|
0:11:09 | them |
---|
0:11:10 | but |
---|
0:11:11 | uh these results |
---|
0:11:12 | although they are too much different but |
---|
0:11:14 | i should say that |
---|
0:11:15 | uh our |
---|
0:11:17 | uh |
---|
0:11:18 | noise |
---|
0:11:19 | it's |
---|
0:11:19 | stationary remote |
---|
0:11:20 | and uh and uh real work it is |
---|
0:11:23 | not really the case |
---|
0:11:26 | coming |
---|
0:11:27 | some |
---|
0:11:27 | more recent data that |
---|
0:11:29 | we were here |
---|
0:11:30 | see |
---|
0:11:30 | that |
---|
0:11:31 | if these results from this to tell them to generalise to nice two thousand |
---|
0:11:35 | eight and maybe need two thousand |
---|
0:11:37 | ten because |
---|
0:11:38 | we were one of the ladies that i for you for some should be nice |
---|
0:11:41 | two thousand |
---|
0:11:42 | ten sre and this was |
---|
0:11:44 | our |
---|
0:11:44 | based system i mean the contribution of our |
---|
0:11:47 | uh university of eastern finland was |
---|
0:11:49 | trying some |
---|
0:11:50 | new features |
---|
0:11:52 | and it's |
---|
0:11:52 | for speaker recognition |
---|
0:11:54 | looking at the results |
---|
0:11:56 | let me see |
---|
0:11:57 | just |
---|
0:11:58 | somehow |
---|
0:11:59 | how them |
---|
0:12:00 | group |
---|
0:12:02 | the system here is |
---|
0:12:03 | 'cause that's where we are with |
---|
0:12:05 | an A P |
---|
0:12:06 | and the condition is |
---|
0:12:07 | eight content second if you ask me why it contents that can be selected for |
---|
0:12:11 | evaluation 'cause i was working on a forecast for |
---|
0:12:14 | the speaker recognition and this was something |
---|
0:12:17 | well let me say |
---|
0:12:18 | somehow it has some metric nice |
---|
0:12:20 | how to and i selected here |
---|
0:12:22 | for the presentation but |
---|
0:12:23 | if uh |
---|
0:12:24 | we |
---|
0:12:24 | look at the other |
---|
0:12:26 | core test |
---|
0:12:26 | also |
---|
0:12:27 | they have this |
---|
0:12:28 | same |
---|
0:12:29 | uh interpretation |
---|
0:12:32 | looking at the results of any weed out any P |
---|
0:12:35 | it says that uh |
---|
0:12:36 | uh |
---|
0:12:37 | it's plp |
---|
0:12:38 | based results |
---|
0:12:39 | they are improving |
---|
0:12:41 | the det care |
---|
0:12:42 | in uh all that |
---|
0:12:44 | area if |
---|
0:12:44 | i |
---|
0:12:45 | i carried the results correct |
---|
0:12:47 | uh thing |
---|
0:12:48 | i mean dcf at whatever rate |
---|
0:12:51 | A S P L P is improving compared to |
---|
0:12:54 | yeah |
---|
0:12:55 | mfcc here directors are for |
---|
0:12:57 | uh many of the balloon are for females and the green one uh is for |
---|
0:13:03 | let me say |
---|
0:13:04 | all trials male and female |
---|
0:13:07 | coming to the results |
---|
0:13:09 | with any any |
---|
0:13:10 | the effect of using S P L P |
---|
0:13:13 | but to someone rotating the det curve in some sense because |
---|
0:13:17 | min dcf |
---|
0:13:17 | getting through to be but |
---|
0:13:19 | equal error rate |
---|
0:13:20 | get a bit worse |
---|
0:13:22 | but |
---|
0:13:22 | if |
---|
0:13:23 | uh |
---|
0:13:25 | you had |
---|
0:13:25 | why |
---|
0:13:26 | happening |
---|
0:13:27 | we have i have no idea right now |
---|
0:13:29 | we just applied |
---|
0:13:30 | live in this |
---|
0:13:31 | S P L P and |
---|
0:13:33 | uh we try |
---|
0:13:34 | time |
---|
0:13:34 | effect |
---|
0:13:35 | but |
---|
0:13:36 | coming to the interpretation that |
---|
0:13:37 | why it happens need more study on it |
---|
0:13:41 | well |
---|
0:13:43 | i think |
---|
0:13:44 | yes |
---|
0:13:45 | this was the point that i want to |
---|
0:13:46 | oh |
---|
0:13:47 | thank you |
---|
0:13:57 | okay questions we have the whole question could've |
---|
0:14:14 | just click less know that yes signal to noise ratio you use on the inside |
---|
0:14:19 | T |
---|
0:14:20 | no matter |
---|
0:14:21 | yeah and you also had yeah we'll deal |
---|
0:14:25 | in the you mentioned that and that you know performance was supported in the two hundred zero D B |
---|
0:14:32 | yes |
---|
0:14:33 | um |
---|
0:14:34 | my question is how did you miss european signal you know to noise ratio because one nine or you know |
---|
0:14:40 | what do you |
---|
0:14:42 | it sounded as yeah maybe it's me maybe |
---|
0:14:46 | he |
---|
0:14:46 | other people may not agree with the |
---|
0:14:48 | i thought i think that's the most signals so therefore maybe that's not zero D B maybe i did that |
---|
0:14:54 | idea |
---|
0:14:55 | women tend either minus ten |
---|
0:14:57 | higher |
---|
0:14:58 | i don't noise in it |
---|
0:14:59 | and then i mean uh yeah yeah uh i thought |
---|
0:15:02 | the editorial you display |
---|
0:15:05 | that you called not zero D B |
---|
0:15:07 | sounded is in |
---|
0:15:08 | the signal is only the stronger than uh you know zero D B situation |
---|
0:15:12 | well because i was suspected that somebody will ask how i'm like that with exactly the matlab code that you |
---|
0:15:17 | have to get yeah |
---|
0:15:18 | i uh i can interpret here that we are measuring the energy of the every frame that |
---|
0:15:23 | speech signal and averaging them |
---|
0:15:25 | or |
---|
0:15:26 | signal and over the noise and uh putting all that |
---|
0:15:30 | snr |
---|
0:15:32 | snr here |
---|
0:15:33 | yeah |
---|
0:15:34 | to to gain to have the game |
---|
0:15:35 | and then |
---|
0:15:36 | needs to all just signal together |
---|
0:15:38 | the noise |
---|
0:15:39 | and we'll get signal together with thinking that we have as |
---|
0:15:42 | average snr |
---|
0:15:45 | so |
---|
0:15:47 | you are meddling signal to noise |
---|
0:15:49 | yeah |
---|
0:15:49 | so by using that intense |
---|
0:15:52 | the |
---|
0:15:53 | uh rather than uh no i'm pretty you know the |
---|
0:15:57 | yeah yeah framing the signal and the measuring the energy of the |
---|
0:16:00 | uh |
---|
0:16:00 | frames |
---|
0:16:01 | and uh averaging the more |
---|
0:16:03 | signal |
---|
0:16:03 | and uh |
---|
0:16:04 | okay uh |
---|
0:16:05 | finding the |
---|
0:16:06 | relative gain between the noise and signal |
---|
0:16:24 | i don't see any difference in these |
---|
0:16:26 | ah |
---|
0:16:29 | what you cant difference you expect to see |
---|
0:16:31 | well that's noisy i expected the spectrum ooh |
---|
0:16:34 | these are |
---|
0:16:35 | flat and then filled in |
---|
0:16:37 | noise |
---|
0:16:38 | i mean this is a |
---|
0:16:39 | this looks |
---|
0:16:39 | because |
---|
0:16:41 | yeah this is depends on the noise |
---|
0:16:43 | because this is fact |
---|
0:16:44 | just the the noise that we use here |
---|
0:16:47 | it just factory noise |
---|
0:16:48 | i |
---|
0:16:48 | these |
---|
0:16:48 | right |
---|
0:16:49 | just |
---|
0:16:50 | uh i had these |
---|
0:16:50 | type of behaviour we just selected one right |
---|
0:16:53 | the effect of noise is not the same for all frames maybe you're right because |
---|
0:16:57 | the I S P X |
---|
0:16:58 | right |
---|
0:16:58 | but i think that by increasing the noise on the noise level of the spectrum |
---|
0:17:02 | uh it's flat and more flat and we are losing the information |
---|
0:17:06 | in the spectrum but just some typical example to show |
---|
0:17:10 | how it works |
---|
0:17:28 | the other questions |
---|
0:17:30 | two questions |
---|
0:17:31 | we have it or not |
---|
0:17:32 | but may get one more interpretation that |
---|
0:17:35 | we use this as the L E in conjunction with mfcc add other features |
---|
0:17:40 | and uh i for you separation |
---|
0:17:42 | is that we |
---|
0:17:43 | right somehow evaluated our system |
---|
0:17:45 | or just yeah he uh |
---|
0:17:47 | feature |
---|
0:17:48 | and then |
---|
0:17:48 | i mean |
---|
0:17:49 | score four |
---|
0:17:50 | so |
---|
0:17:50 | subsystem |
---|
0:17:51 | they use the other side |
---|
0:17:53 | sensing i for you and taking |
---|
0:17:55 | uh uh let me say |
---|
0:17:57 | uh |
---|
0:17:58 | using uh me |
---|
0:17:59 | having |
---|
0:18:00 | this type of |
---|
0:18:01 | them that they are |
---|
0:18:02 | uh ultra wide |
---|
0:18:04 | beat |
---|
0:18:05 | S A P |
---|
0:18:06 | a different type of |
---|
0:18:07 | score |
---|
0:18:08 | speaker |
---|
0:18:10 | or just |
---|
0:18:11 | one of the assumptions |
---|
0:18:12 | um |
---|
0:18:13 | for for your model is that you |
---|
0:18:15 | more energy |
---|
0:18:17 | um observations in the signal |
---|
0:18:19 | so in the most reliable right |
---|
0:18:21 | that's right we have a good because that has the energy of the noise increasing |
---|
0:18:25 | they could be a really just |
---|
0:18:27 | uh |
---|
0:18:28 | uh coloured by the noise |
---|
0:18:30 | okay i just |
---|
0:18:31 | the the other side of the uh the the body is also |
---|
0:18:35 | uh if you don't um |
---|
0:18:37 | uh |
---|
0:18:37 | the situation where you getting distortions because |
---|
0:18:40 | uh |
---|
0:18:40 | or are driving the channel for example |
---|
0:18:43 | um and it may be the case where the signal is actually one time |
---|
0:18:48 | and then you |
---|
0:18:49 | the |
---|
0:18:49 | could be |
---|
0:18:50 | um but maybe another |
---|
0:18:53 | indicated |
---|
0:18:53 | silver jews |
---|
0:18:54 | the work |
---|
0:18:54 | syllable are energy |
---|
0:18:56 | um observations |
---|
0:18:58 | well in this case you're right uh |
---|
0:19:00 | we don't know exactly what will happen if signal is to be |
---|
0:19:03 | by by channel by recording device or |
---|
0:19:07 | what about this |
---|
0:19:08 | formance us are just somehow done with the uh |
---|
0:19:12 | sounds and that |
---|
0:19:13 | uh |
---|
0:19:13 | all the signal exactly but if you ask me what will happen if all the signals here |
---|
0:19:18 | i will say that |
---|
0:19:19 | uh i think |
---|
0:19:20 | after the spectral L P spectrum that all |
---|
0:19:23 | fig |
---|
0:19:23 | the same way that we hope U S we'll get the fate |
---|
0:19:28 | really thank you very much |
---|