our idea i am writing saving uh what about um research uh directions that uh we are working in university of is there a lot is concentrated on uh uh new features for uh speaker recognition uh it could be i have a line three in our laboratory that we are working on um oh let me say i'm a little meeting on and working on it and new features exploring area of new features for the speaker recognition now the current work is the application of uh some sort of features name uh uh weighted linear prediction features for the speaker recognition and uh our work is yeah done jointly bar a group of how holding university of helsinki that nowadays the following a all all the universe let me just say that uh i'm not pretending that i know what is happening inside the uh is a weighted linear prediction i'm just presenting on this up my understanding that what is weighted linear prediction and uh from the group of older is here i just have to me to help me if i cannot describe something to you then so the concept we have sometimes the the customers or reckons that the users of speaker recognition technology that they want to use speaker recognition when they are when when they are outside of environment but we also have some sort of other users of a speaker recognition technology that they want to use it in any environment in high energy noise environment in the street or like fig track noise whatever they are not in office environment or the way to control uh and wonderful speech record then we are interested you know how our speaker recognition systems could it the degrade in performance but having this type of additive noise well just to describe what is our for since record here is the we are uh i think that uh uh a typical speaker recognition system but different phases and different modules but our for in this society is to see that how feature extraction could affect speaker recognition all of the speaker recognition performance how typically we are being uh feature extraction is that we have window frames we do is it from estimation having the mfccs duress the filtering appending delta double that of frame dropping according to energy and uh cepstral mean and variance normalisation this is something typical weekly thirty six dimensional feature vector that we have in our experiments but uh this is just based on problems is that we have then now the question is that is really uh we are all the time using F p2p to make the spectrum but if it is really uh the best way that we can do it or another question is that is it really that much for robust in additive noise condition that are going to the L P it is something uh well known that uh estimating the spectrum could be done by linear prediction or if if the estimation and they are uh fig just alternate model alternate the way to estimate the spectrum nobody save uh that L P is better for speaker recognition or if debaters metaphor the speaker recognition or even any other the speech processing applications and that now we are trying the two uh say that what is the performance of fft L P and now introducing we L P the V L P it's uh just targeted to pay more stress and some regions that speech that uh do you have let me say uh they have more energy yeah we have uh uh the way the uh we are waiting there energy of error by a weighting function and where the weighting function comes from is that we are uh computing there yeah we are we are computing the uh the weighting function as that the immediate energy of the signal before that current sample something like in samples before the current sample and put it and weighting function where we are estimating interrupted for example based on the previous at in this way it's possible yeah again set the derivatives of that wait echo or uh with respect yeah estimator a chance to zero and at least two normal double curve uh decorations and fine the weights after predictor and uh it is maybe the history count tonight seventy five and after that again activated in nineteen ninety three that the weighted linear prediction but let's say why we are choosing the S E short time energy four weighting function of the V L it can be true that yeah regions speech that they are they have high energy they are less contaminated with additive noise and uh it is a something some uh some sort of five but it is known we can have it or estimation of the spectrum in the region that speech that they are less corrupted by noise and these regions that speech how higher short time energy it corresponds also to the region of the i mean when you're talking the regions of a speech that they are higher short time energy it also corresponds to the regions that uh our little hole a little and the yeah some local system it disconnected from this the speech production system and the in this case we have some standing wave inside our local calls where if we want to compute formance of speech signal we can have more prominent uh formant estimation of that speech signal well if now what is the problem with reality normal equation somehow gravity to lead two table filter when we are predicting the coefficients of the predictor now the problem with the L P that it is that correctly sure to lead to stable filter and this is a problem speech thing as for example oh how we can what we can do is that uh instead of using some sort of weighting function we can decompose into partial weights and a light in this way to the estimator after uh yeah current sample and in this way we can only to such equations that they are derived in the paper up to maggie and uh uh they describe the behaviour of the a total weight i mean these base in the way that the final estimator coefficients should be it should be in such a way that lead to the a stable filter well i'm not still understanding completely what's happening here but in this paper because we describe describe but for more different please you can refer to that paper well here i'm the reading of frame and i spectrum estimation of it voice right from these two thousand to uh sorry and the uh somehow the same frame that we contaminated with factory noise with your db snr it is let me think obvious that uh uh when we are doing the the uh spectrum estimation of the noise to signal there are some problems that it is mainly cool by the the the the noise signal and how it affects depends on the snr level it depends on the noise that is adjusted sample and the tequila just more intuition what is zero T V factory noise i have here yeah speech file just the P stuff speech files that we do all this frame from those people speech file a little it'll the other way we go real but i don't know what or something yeah it was a clean sample from these two thousand you test set yeah yeah yeah yeah the other way the remote really or yeah and same piece that we can can it be zero T V additive noise well factor well no it shows that what are what is really the mean by zero D B snr yeah yeah yeah yeah connected to some results ah yeah let me think uh opted for spectrum estimation method that we are thinking about and used to come into corpus we had known or has some other type of speaker detection and using factory noise then the only be snr here we can see that the method mainly grouped into sure method after the N L P and let me see the weighted L P group plp itself and it's the L P yeah i i should mention that needs to go into it's a uh the database collected in uh um uh that mobile handsets mainly and it includes inside come with uh convolutional noise and some additive noise although we are i think i did too much white ourselves yeah we can see that that is really some difference between the performance of these feature in additive noise environment we don't try uh some just let me say one very famous a speech enhancement method and uh uh as it just some added to black in our feature extraction to see what really uh one simplicity speech enhanced method i have a speaker recognition system in additive noise N Y and looking at the results it shows that yes there is uh some good improvement based on having a speech and enhancement or latency spectrum yeah subtracting our them but uh these results although they are too much different but i should say that uh our uh noise it's stationary remote and uh and uh real work it is not really the case coming some more recent data that we were here see that if these results from this to tell them to generalise to nice two thousand eight and maybe need two thousand ten because we were one of the ladies that i for you for some should be nice two thousand ten sre and this was our based system i mean the contribution of our uh university of eastern finland was trying some new features and it's for speaker recognition looking at the results let me see just somehow how them group the system here is 'cause that's where we are with an A P and the condition is eight content second if you ask me why it contents that can be selected for evaluation 'cause i was working on a forecast for the speaker recognition and this was something well let me say somehow it has some metric nice how to and i selected here for the presentation but if uh we look at the other core test also they have this same uh interpretation looking at the results of any weed out any P it says that uh uh it's plp based results they are improving the det care in uh all that area if i i carried the results correct uh thing i mean dcf at whatever rate A S P L P is improving compared to yeah mfcc here directors are for uh many of the balloon are for females and the green one uh is for let me say all trials male and female coming to the results with any any the effect of using S P L P but to someone rotating the det curve in some sense because min dcf getting through to be but equal error rate get a bit worse but if uh you had why happening we have i have no idea right now we just applied live in this S P L P and uh we try time effect but coming to the interpretation that why it happens need more study on it well i think yes this was the point that i want to oh thank you okay questions we have the whole question could've just click less know that yes signal to noise ratio you use on the inside T no matter yeah and you also had yeah we'll deal in the you mentioned that and that you know performance was supported in the two hundred zero D B yes um my question is how did you miss european signal you know to noise ratio because one nine or you know what do you it sounded as yeah maybe it's me maybe he other people may not agree with the i thought i think that's the most signals so therefore maybe that's not zero D B maybe i did that idea women tend either minus ten higher i don't noise in it and then i mean uh yeah yeah uh i thought the editorial you display that you called not zero D B sounded is in the signal is only the stronger than uh you know zero D B situation well because i was suspected that somebody will ask how i'm like that with exactly the matlab code that you have to get yeah i uh i can interpret here that we are measuring the energy of the every frame that speech signal and averaging them or signal and over the noise and uh putting all that snr snr here yeah to to gain to have the game and then needs to all just signal together the noise and we'll get signal together with thinking that we have as average snr so you are meddling signal to noise yeah so by using that intense the uh rather than uh no i'm pretty you know the yeah yeah framing the signal and the measuring the energy of the uh frames and uh averaging the more signal and uh okay uh finding the relative gain between the noise and signal i don't see any difference in these ah what you cant difference you expect to see well that's noisy i expected the spectrum ooh these are flat and then filled in noise i mean this is a this looks because yeah this is depends on the noise because this is fact just the the noise that we use here it just factory noise i these right just uh i had these type of behaviour we just selected one right the effect of noise is not the same for all frames maybe you're right because the I S P X right but i think that by increasing the noise on the noise level of the spectrum uh it's flat and more flat and we are losing the information in the spectrum but just some typical example to show how it works the other questions two questions we have it or not but may get one more interpretation that we use this as the L E in conjunction with mfcc add other features and uh i for you separation is that we right somehow evaluated our system or just yeah he uh feature and then i mean score four so subsystem they use the other side sensing i for you and taking uh uh let me say uh using uh me having this type of them that they are uh ultra wide beat S A P a different type of score speaker or just one of the assumptions um for for your model is that you more energy um observations in the signal so in the most reliable right that's right we have a good because that has the energy of the noise increasing they could be a really just uh uh coloured by the noise okay i just the the other side of the uh the the body is also uh if you don't um uh the situation where you getting distortions because uh or are driving the channel for example um and it may be the case where the signal is actually one time and then you the could be um but maybe another indicated silver jews the work syllable are energy um observations well in this case you're right uh we don't know exactly what will happen if signal is to be by by channel by recording device or what about this formance us are just somehow done with the uh sounds and that uh all the signal exactly but if you ask me what will happen if all the signals here i will say that uh i think after the spectral L P spectrum that all fig the same way that we hope U S we'll get the fate really thank you very much