mission

i'm not sure must be from but

and we will have a five

papers

first one

will the was that incorporation of eer for a start but

score variance spectra based normalization for i-vector standard probabilistic linear discriminant analysis

the authors are okay started

if we show skit domain is it is possible

not possible task

also you are the last one i can problems

so

present paper

yes thinking on that

so that in the past on that just mention needs a collaborative work so actually

it's also offer a lot because

the

right

the work has been started with condition scale and because you don't invest in the

speech and so i wanna start with some analysis of for what we did before

and also try to improve the work that has been previously

so

this is based on i think so and kcca welcome back to i-vectors

so i would start with a brief description of a system and which is based

on classical i-vector in i

yeah

a tall muscular the post processing of the i-vectors beef between the i-vector extraction the

plp

which is the buttons a system where we try to improve the discriminant see so

usually by using a D approaches

and also to compensate for the session variability so one way to do it is

to use the length normalization there are plenty of way to do this but i

will focus on these two

and as the discriminant C is a related to the variance

the data are and we look at

in the between and within class variability

so

we start with the description of the system so that

between on one for

so the system is just a classical ubm

everything is gender dependent from the beginning to the end

so the idea is to some distribution

the we extract mfcc sixty dimensions of the use the based on the use recognizer

and the constraint is the very classical so using a large amount of data for

based on four or five or six

and wait

so for the second pass the i-vector extractor also gender dependent and

we only telephone data from these four or five was the switchboard future

i think it's quite the state of the art

so just a rough idea of the number of sessions

and for the i would say that a normalization and classification training which includes both

the gplda training

and you training and everything will see in the following

we used a gender dependent subsets of the various sets of data

based on still for five or six and sweet spot and we use only is

because of the number of sessions yeah

and the we restrain the development set to segments for which the nominal X

is higher than one hundred eighty seconds

so no look at some tools that can be useful when we talk about variability

so first i would just remind discriminant C and covariances so

we

a commonly used the covariance matrices of the total covariance the between class covariance the

within class covariance

but usually it's very i mean it's very common speaker verification to instead of using

the between and within class covariance matrices to use the scatter

matrices

so the definition is roughly similar and so they can use

is that one of the ozark and for several applications

the recent chapter

is that

i don't the scatter matrices the do not take into account

the number of sessions per speaker so the weight actually a speaker is that the

one of the pounding of the number of sessions

so i think it's a commonly used look at we just need a few experiments

distance to see

in our system

one of the other it's much efficient

so what talking about classification what we are interested in is to

read use

the maximise the between

speaker variability and reduce the within speaker variability

and one way to do this is to look at the covariance

and so what we need to do

this to which is a spectrum and so too is very common to

yeah of the raw

the main

what is it so on this graph of we can see three plots which are

coming from the top of any to the violence

is that science and within class variance of for us so the speaker and session

so what we compute the between class covariance matrix

B

then we rotate all the data on the development set in the i-vector basis

can be

we compute then dimensions

and then we just but the diagonal of this matrix so you can see that

the variability

in the first dimensions is higher for the speaker and also for the sessions

so now talking about this way to maximize this ratio is to use the very

common lda someone is just maximizing the rayleigh coefficient

so there is completely defined is really coefficient using the within and between-class covariance matrices

or using the scatter matrix

so in this work the it would be used to reduce an exercise from six

hundred to eight so this is constant for all the experiments we have

and the to go is that it system description

we try to define scoring the first one is based on the two covariance model

that has been used by need to two years ago we can write

and so

shen

and the second one

is based on the period using the gaussian assumption

that you were used is based on so we used the eigenchannel matrix of the

key

but the full range because on television this time was using the diagonal see

so the number of speaker factors in the key thing but i mean at to

be consistent with the lda

and the number of channel factors six something because it's the way to

compensate for the diagonal

so that the problem is all this

students including to model programs and here is that everything is based on the questioned

assumption and

for those working you know that's two D C

we have very good to know that we are talking about it at the T

and the noise very company

not in the community that the i-vector are not following the nice motion but something

a bit more that you like

they didn't

distribution

so what we do is that we try to take all decided these i-vectors and

make that make the distribution motion

in one way to do this just been proposed initially by to present the same

time

is so then i guess they're male and that's the speech intention

is to normalize the magnitude of i-vectors

so using this formula as this one and just the

we centered at thousand that we just normalize them into

so using this method the distribution the car become a bit more cushion

and we can see that the effect is

very efficient

so just using the tool to

but

two covariance model

we can see that again in both equal error rate

and this form at mit on nist two thousand and so this

and instances two thousand and extending

is a simple presentation

so everything until now is very common so going back to the to the to

introduce previously

oh we would like to show the effect of length normalization

provides a by a spectrum

and as you can clearly see

a det curves a exactly the same except for the rest of the value was

because a normalizing the magnitude

we can see that

the button on the right side are smaller but it doesn't affect me much just

so

fortunately

an initial papers the maximization as we introduce with

whitening so it has to be done after whitening of the data

so that they got several in this in this algorithm so the whitening is just

using the total covariance matrix you know when i vectors and then we apply the

length normalization

at the same time initial risque introduce the eigenvector measure which is just a whitening

plus like summarisation but don't iterative

and by this iteratively the interest of this method is that for

converge very fast

and we introduce some properties

that we can use further

so the properties out that the need of the development set is a converging to

zero very fast

the covariance matrix the total covariance matrix is become the identity you five i

and going from this all the eigen vectors for the from the

between class covariance matrix

because also eigen vectors of the within class covariance matrix

and thus using all this property together

it happens that the eigen vectors of the

between and so within class covariance matrices

now solution of the

and the optimization

that means after all this

it at and the eight yeah improvement is

so that was one of the conclusion junctions

first paper

and the that we can see the effect of the this normalization of the on

the variance spectra

so before we a treatment i-vector based on this

provide

and after one

after one iteration which is exactly what the former romero

proposed

and so what i think the signal

we can see that the total covariance spectra become a flat

after two iterations

even better

and after three

almost perfect at least for the human eye

so you can see that

the big advantage of this paper is that the first dimensions

data does not contain the major portion

the variability

there might a portion of the session variability

so what actually

after this treatment the i-vectors become

optimal for the weighting coefficient optimization that means this should be the

optimization of at

so to illustrate this some results using the lda then we use the two covariance

model for score

and

so the baseline is just the length normalization when i say length normalization

without any whitening

is just the magnitude normalization

so you can see that using the

and eigenvector original doesn't improve

the performance after one iteration

if we use the scatter matrices to compute the U

but in the case we compute the

the at you using the between and within class covariance

we can see that for the female at least it improves the performance

and after two iterations

we can see that the conclusion is the same means using this data

the between and within class covariance matrices

he seems a not optimal so it's better to inspectors use the between class covariance

the initial definition

so that after this result we try to apply the same data to

before the P which is more robust maybe the covariance model

so

this is the baseline using only length normalization and when we apply two iterations eigenvector

original which is optimal in this case

that we see that the data is not adapted for the key idea

so the performance on the bizarre

might states even worse

but

there

so it was a extending this work a by still looking at the covariance is

but

thinking that after the length normalization everything is on the sphere so that means we

have a spherical surface and what it does not like this

and is very difficult to estimate the covariance matrix

because when you look at each speaker

from one side of this field one also that the within speaker variability we

very different

and if we just take the average of this

to estimate the development set within class covariance matrix

then it doesn't make sense anymore because the them at the

metrics negative for some speaker but obviously not for

so what was in this paper is that keeping the detectors on the surface because

no it's commonly admitted that is

really to use t-normalization for the session compensation

but we want to be the principal directions for the decision boundaries

that means

we won't us within class covariance matrix to become

diagonal and even better if it's the just the i don't teams about

a constant

so we decided to apply exactly the same algorithm as previously

an iterative process which is using the same process instead that we replace

the performance metrics

by the within class commencement

and so by doing this

we can see on the spectral of the same set of development that one iteration

make them

the set become very fast so this is the session but we can see that

it's almost what spread

oh the dimensions

and the after two iterations

for all or from the point of view fume and still exactly the same but

in the rate that you the performance

so that weak emission we can see that it's completely flat and what's the effect

so this

when we use them to us so that's why i'm gonna show in a few

minutes

but before that i just want to identify the this process can also be used

to initialize the key here matrices

actually

for most of us are using a pc in order to them

to initialize the key idea matrices because

provide the first point

the first information space

which we can reproduce so that's a very good starting point

and but actually what we propose here is to use this process so we what

they all the i-vectors the eigenvectors basis of B

and then we initialize the was that this is the speech in the speaker factor

matrices matrix

we each we initialize by using the first ten dollars

the distance

then for the eigenchannel metrics we use the

to rescue decomposition of the brain

the within class commencement

actually if you

if you can see that actually using you wanted to think the eigenchannel matrix

you can just initialize the signal using the same process works

i think

so that some results using the so we don't using before it's just detectors plus

the normalization process

and

so i just want to mention that

for the random initialization of the pac as the performance can vary it

depending on the initialization point

we performed and experiments with different physician and then we may be averaged the results

so you can see the baseline that i previously presented and also the eigenvector method

which is not efficient this case

and you can see that using the spherical normalization

how we call this

you normalization

performance

so improve in the case

so

no

the say the C station

process that i just described we can see that the performance of data

but i just want to that the fact that performance on the best are actually

it's just the fact that

in this case going towards for

the performance when using this physician are just the lower bound

of what we obtained by using mandarin physician

so that means it's a it's maybe better but i guarantee a certain of and

performance

so

to conclude this presentation i just want to new

and for that the fact that we used

so i didn't do this to the band spectra which is very well known be

non-separable so that

use that used in the presentation may be a few

use it

but this tool was to analyze the performance of the system and actually can also

be used to

what i'm thinking after obtaining the two i-vectors

it's a very good indicator of the quite

what

extractor

because just looking at the spectral you can have a rough idea of the performance

we get that yeah

and i think iteration is doing some experiments at this time and you will present

this

in this thesis i think very soon or he doesn't

so i

this would have to be useful for analysis proposed

so for the case we shoot

coming back to our previous paper we show that the rating process

the normalization whitening

to improve the performance slightly so it's not that the improvement

why not doing it twice and it's three

and

also that the co-occurrence matrices

i think you know case perform better than using the scatter matrix

then to and this talk just remember that the spherical nuisance normalization

in the in the middle

improve the performance of in the case of

you scoring

and also that

something in mentioned before but when you use the this type of process to initialize

you didn't matrices

the and the you don't need to perform

so yeah em iterations

so for the case i presently we obtained the best performance but

using hundred iterations of yeah

in case of problem can see section

using this process we just need to make ten iterations

so if the key is not the requesting them

training

in some ways to reduce the time

so no if you and question

yeah

oh

the

oh

i

i

oh

i

i

i

yeah i actually if we get really i don't like the length normalization because it's

three a

and only not process which is going just right now so apps of justly but

and i think we need to find a way we address this issue

by finding something more

consistent

you

yeah i

and you

i