i

she had your dark suit in greasy wash more

zero such and number eight this type of features for speaker recognition we've got five

papers

there will be presented in the session

there is a bit of time yeah

before the process

process will come for a this evening's event my

we can actually now you're a little bit afterwards

discussion

sure the first error talk is a feature extraction using two younger regression models for

speaker recognition a johns hopkins group

rescue representing the paper

oh

i think that you want to ask once you got constraints also i'm just sitting

here watching

so that neither idea of the last but not what they are

is that i want to use this but also for S L T V

possible to some discussion about features in general for speaker recognition

because i think we started yesterday and i was to realise that we have some

issues just like the mean

so i have a few slides at the beginning which perhaps

it would be more general and what i want to talk about that like

and he'll be back again is it

i always like if you have a questions during the presentation

these past me immediately don't feel

but i mean if we don't this work the slides

is that receives as everybody's here and you don't know what i'm talking

so just keep asking questions or something

so the business a is following

we have a speech

speech information so he streams

that is a speaker by

that is

probably should environment

and this information

and this is

speech

a right be pretty

all of us

one of them

so these are really

if you are not speaker a special case i environment or you message

can be used or a

and

speaker

oh

speaker is easy job is the number of things

which may consider as a disturbing as audio

one i saw the speaker has a speaker agents only can do not annoying

source audio video information if you would like to be invariant

a nice piece of a signal and there is this information

right

you

and balls

analysis of features

and the classifier

yeah analysis that would you stop which we know

before

we see that they

this is based on what everyone in school or whatever you got from previous experience

with the day

and then there is a classifier and classifier was typically train

now is the distinction you another classification is somehow coming because we then train feature

extraction

so

that is

some you know like

but the so as i said and this is what exactly what is what we

tried that before

also

outcome of this whole process should be in our identity of the speaker

right so the eagles this process

somehow alleviating sources

on what information

yeah

stressy information about the speaker so you would like to see you analysis

which somehow suppresses to answer

the message will influence of the environment and so on and hence is the information

about who is speaking

but of course time constant you also over years in speech research

that is very often better

what is used as much as possible to use something which is wrong because what

you have either you can get

and so on

yeah

that in speech recognition here with the same way as we end up in the

speech recognition

i well i know how to do this process speech that is

you take the signal and there's some frequency axis is not clear so you get

a sequence of each of the each maybe describe the signal in different frequency sub-bands

you're at five

ignore to face there and you want to find it somehow you know in quotes

because people that hearing is to some extent first thing and the press

and this properties might be might be useful some properties

and

so we don't signal so analysis is very high

right here

so this typeface that's

the money

and then some modifications to depending on the school of thought of that

the be seen before the vacations

plp people different modifications the mfcc people and so on it's own there is a

people

and

yeah we take it cosine transform a few cases because people

the features used modifications most likely there is some compression type of a room that

something happens here are also transformed here approximately the correlation features

and you get the cepstrum

and cepstrum is what we use any using

both in speech and speaker position and all these are the worst

so that's because

we should be all representation

as the speaker recognition people i'm not world from speech recognition you are you this

if i give different whole speech recognition people also oral presentation from speech coding people

and so on and so basically he was then

on the shoulders times

right so much as i mentioned briefly at the work site you

i

yeah data is actually a slight

online

so what was the sources of a body at that time

so that the source a different channels

what you

right most in one

interspeech case

so we use a set of points

about the speech sound

that's why we shouldn't

what conditions

you

this information which is of course

the design

the most suitable

function

and so forth

this is something you

of course is you don't live which i or typically work will be first thing

this

the but you just changing channels down

a lot of course also the goal

and high basically space exposed

so this is the formation

which i feel E you will be are not speak

yeah so yeah pretty funny "'cause" it's a little late and of course

say

briefly speaker a techniques like a background what

joint factor analysis and so one image that i speakers

in some cases embarrassingly well i zero

doesn't exist by from the G

so

probably doesn't is not sure that

i

now let's see how much this machinery minutes i mean from the from these days

i

so

i

i

i

all

this is so this is like i think that

yeah exactly

my

you know this is a spectrum so it's not accustomed as the suspect

so that is because we copied some as well as far as a very fast

where

yeah

firstly

yeah that's

such in the break that it might be worthwhile looking back into these

the basic analysis

because we have a data much more data and a very fancy processing techniques may

wants to know how much

variability yeah exactly how much variability

i

i

i

i

so

i

i

yes

and the techniques which you can be physical for recognizing speaker actually very much bigger

than

that is

maybe

is it is misleading because we use

when you

speaker dependent on

i

yeah

i

and

what are you want

ask

or maybe sets it they pay cisco phase right

somebody

and

but the same decide it is this work

the work on i don't sources and methods applied

speech on this is people might be more specific for speaker recognition

but this would be another story so results

this

i talk about are based on deriving spectrum are focused on

normally the signal people you know a second time

and after some preprocessing fine autoregressive model i mean like

and what we what log spectral line spectrum and we for a and a

and a spectral

spectrum

right

the sequence is the functional a

you can also differently in this to help with this

where presenting here

if you think it will sometimes long signal

in do exactly the same thing

and you between those on your on your cosine

so then you want to be able to derive the model and in this particular

frequency that

but by wideband and you end up which is time-frequency nation

just like you for this is that you know i sometimes like all overlay this

is this is a very rich people whose second level or when they do this

i

spectral

and this is maybe more weight to each hearing is working because i don't see

that

and second of speech and speaker

what frequency components and the most

then the second so this is the way you have a what is important for

you to somehow get some system

the global this way

start

this

well i enough not be possible at you know which we can see if i

was

which one

if you just look at the picture might believe me

okay

yeah

so this is what we all frequency domain linear prediction of these gonna fight students

recording three

as you don't prediction

oh that's a perceptual linear prediction so this can be side i

but i think it is a quite a bit of perceptions

it's

as the

so here is one seven

we have a signal

yeah you have a basal

finally all of this model

oh

and you also otherwise

see what is left after

is that

and not really different frequency bands that i

this time domain signal are bands

different frequency band you can be some is for the channel over there

so the resynthesized speech from adults only one can also synthesized speech from them

yeah

so if you

the signal

oh

oh search

i

i table you

yeah

i

oh

oh search

i just don't

yeah

and

i

if you where

well i

i

i

that is to send messages because then

thus

speech

but bottom line here is that

what we should not be used for speaker a single be this way

oh

i

a four or is that actually you know

in some ways

one is some there is a whole

components

yeah

formation

well

also

shen

for

speech

here is that since a young

a simple and here is that you get a sound

robustness so you know it's

and you have a representation

yeah

in

so as well as you have some problem here

give some more

high energy possible and we can see

oh is assumed

so

i mentioned in

so

well as a whole

which i

since so you'll find the right

as i

so if you before

or if you

yeah

different S is divided by S

and this is just a to see this somehow

that's easily different frequencies

depending on the frequency

well

channel and you can you like

i one of the suspect

to see what is just a way

or gain of the older this

and that's what we foresee essentially you just ignored

in this new

so

thus

well you right side or depending on

oh

oh by the

the signal is you and i think this task to say

then

so i eight

also

oh or similar

you

more

more robust in presence of an average

noise that's right

is just a mess

well

then

i

so basically we so people

if you look at more than me importance

well

and

so how many that is more

first thing is that

speech

you

and

these be different frequency ranges

E

try to find

i don't know

well

and also different

this is a state

and then they want to be able to use the one speaker recognition techniques which

you

friends don't know and so on

but then we then we just

or something which is small

that's cs for significantly

this way they expect to take a frequency

respect or

and five respect to see that all over

oh

yeah and then so you do this

at a time

this time frequencies

i

this is me

here we already removed

okay some

you

that is

then

yeah

yeah

she is much longer

responsible rule

very short

the communication

so it's yeah

oh that's out

i

style

which i theses so you know

so our

first

yeah i

yeah i think that both my main street

one

yeah

and we also

a false one

i

i

performance

and

oh

this is

both

i

so

again

right

yeah

right

this

so

oh

i

i

i know i was also

i have some

the task

yeah i

same i

i

but that's a

you know

i

yeah

oh

you

right

well

i

yeah

i

oh

oh

i

i

and

well

where

oh

yeah

i can't

yeah

i

i was hoping that are supposed to be

based

oh yeah probably get a degree without so maybe somebody

i think this there's function is expressed here

but at the same time

features and classifier based or a classifier for speaker recognition results speaker

use all the knowledge data used bandage fact they can tell you for different areas

of speech sounds

by different parts of the model and so on and so on

interesting

how to realise doesn't take advantage of that somebody was pointing out what

so yeah that is

i

oh

oh

oh

i

i

i

no

it's such that every utterance is about a sentence we just take over the whole

utterance

if we have a lot of speech we could be chopped into segments i one

five seconds and then be on the length of the segment be always choose the

what the model about how their second right hand we expect second

so what the country that segment of the signal

by to be if you if you don't signal just you mean and the iced

i typically the first

very personal data to model doesn't lid can check

so we use the central file

this is

i

i

vol

i didn't say exactly

what i said that what might be interesting for speaker is to use there is

you run

this process which is like you all pole zero signal in different component and yeah

it was the one which was used here

component was the war which was god

but i have been like what i write

was that it sounded like a global K but information about this about this message

a problem and she would have was

just

information about some information about the speaker

i don't think they and eighty or assigned to the original

the original

this other sort of T V this profile used as it is for speech recognition

component is

component so we just use it as a speech signal utterance

our phoneme recognizers got getting what was it fifty five four percent

fifty percent accuracy

so you can understand the same machine that's the two

a with respect to recognizing phonemes

somebody i you know

i mean happening at the top the loss and all four formants are gone

and everything is one

and it is

it's a bit

i way that you don't

the

the only assumption is not in use it is useful since out

oh also somebody speaker

i

i

oh of course i mean i see that course yeah so that might be right

i

a

or

i

i

of course we oh no i'm you know i again that they all cases you

ask and you saw or fusion right all six together things i one side try

to paper as a matter of fact it was of a speaker recognition

which was called towards decreasing error rates

and there's one of the reviewers if

if she uses here and you feel that

says he is not doesn't between

the paper was rejected so i have a that are saying about you know if

you are working on something you

and of course if you use it on its own is very likely that you're

performance

the other

degrees that's why neural paper to was increasing rate

but now since we have these huge

and that was fifteen years ago and you start working one fusions

if you if you just the goals for different source of what you have a

different source of information you have very like to the improvement after you you'll see

that that's why should research when you things

of the diffusion you are very unlikely degrees error rates will be all right i'm

you want to do something you it doesn't work and you put your what's that

works

and you can present at the conference

seven

others