Speech Transcript - Rapid Computation of I-vector

thanks project but i introductions

and graph and all that it but the them going to present its adjoint what

my a student's t

it's wise the are we hi joanne prof young from nineteen you still pose and

try to train a

so put into the right context we called it to a post present about one

way and in central

is on the use of i-vectors in the lda

so in this paper stand alone to present but the intention is to we use

the computations

in i-vector extraction so we call repeat competition i-vectors

"'kay" for going to detail is let me as bank of a slight

to we send the background and so as the motivations of the work

so and i-vectors extraction process can be seen as a compression process

right maybe you compress

across the crime

and the supervector space

the optimal which is a low and fixed dimensional vector speech recall i-vectors which can

see this

not only the speaker information is but we have the characteristics of the recording devices

the microphones to use

the transmission channel characteristics which including the ankle is made that we use

in transmission

for this transmitted of the speech signals that as well as the cost experiments

point two would be a mathematical form this is the i-vector

this is i-vectors and i-vectors

is the mlp x timit of the

latent variables

and

if you see here we have a single latent variable which is high cross

gaussian

and it i of course of frames so tying across frames and also is the

one that gives us that compressions process

compressed "'cause" a time in this but with the space

we assume that we know the alignment of frames to gaussian

and in the actual implementations this year of a frame alignment of gaussians

could be you love ideally what the gmm pasta you

all

most of is only used a single posteriors i

so no if we look at this latent variables

there is the assumption that the

trial

of this late in trouble is the standard gaussian distributions to be zero mean and

unit variance

so even the observation sequence

we could x t makes the post you which is and that of gaussians

we main five and covariance are inverse

of course this five

will be applied the speech is the posterior means of the latent variable x

and

one can see i-vectors is italy about it was the covariance the pot over t

matrix c

think mars is the colour matrix of the ubm

and f is the centroids first order statistics

and

l inverse which is the post your covariance is under determined by the

joe the statistics

so one point or not is that

in order to compute what extent the i-vectors

we have to compute

the posterior covariance

because this is part of the questions

okay

we cannot in this paper reviews what we called up you want the statistics

where we want to do is to be active speech this task in the house

and it's open to t and f similar here

so this sector simplified equations

we ought having the stick my speaker

okay so now the we have only one

objective in this paper that is really of the computations complexity of i-vector extraction

while keeping a memory common the low

and which like all perhaps not degradations on the performance

okay so why it is important because

is important because implementations of a very fast

exclamation i-vectors could be

before on hand held devices

all for that scale how based applications where a single server may have to

receive request

from hundred or one thousand quite some kind of the same time

okay and

also we reason we also recently we have you know increasing

the numbers of gaussian w is a system for example in the people there is

going to present coming

sections

see

number one thousand which process ten thousand so direct computation would be

something while for these

scenario

okay and

i know whatever estimation is that the

the and i think is on the right precomputation i-vectors

rather conservative exclamation t matrix because t matrix is extreme at once and usually

offline

and we can use a huge amount of computation resources

they can use fixed but

okay so

yes the

problem statement

the computation of alternate of i-vector extractions

lights as the exclamations of the posterior means

requires us to

extreme at first the post your covariance

so are they are

couples of existing solutions to solve this problem

and

including the eigen decomposition method also covariance model but we

fix compose account by a guy

factors subspace

by up a little

and we also on the sparse coding to improve the you know a simplified

the most your cover estimations

so in this paper what we propose is to

complexity may rightly the posterior means be up and it to evade it will still

covariance

so we did this by doing a first one we call to use an informative

prior

which are going to shows later

and the uniform occupancy assumptions are still with the commission this tool

we can do a fuss extreme i-vectors

of course without the need to estimate the posterior covariance

okay so

in the combination of all

the

i-vector extraction we issue a standard doesn't profile

and

no if we can see those

involvement for all

mean given by new p and the core and you must marquee then i-vector extractions

is given by this regions where we have to an additional terms here

people determines by the

cover the prior

and this new mike

so no if we consider the case where this like with the zero this cycle

demanded a matrix then distance will disappear

and is only go to the i didn't matrix so we did use to the

standard form

so in this paper we propose to use this

well for informative problem

where the means to zero but the

but over in this young by this

t is the total where t matrix still we have the inner product

of that order bitexts of and in bus

to be a book file

so okay now i've able to reduce i think

so what is that we in the i-vector second formulas we have additional terms you

about the problem right so now if you plot is into this i-vector extraction from

will then we'll when the get this right so we can always share that it

transpose t there is a inverse because we can this always full rank

i given that the assumption of training data

then we could take this t l

from

no and again this in both then we'll get

right

and then us these matrix inversion identity which

i copied from the matrix a global

okay so like the idea guys of you have a matrix p and q and

p here we construct the although something

p and q by putting this in the front right

so if you look at this formula speech is the same as

this one

right so we can say this is the p is it's a key when it's

the pa then we can put this

for what

and then sort of these right so no if you do and this formulas write

this is the linear algebra this is a projection matrix right approaches in matrix is

you know you can buy in this fall what you want you to a although

than a matrix meaning that

each column of this

you want

is a

all the love each other columns

and there is a unique now

and you wanna spend the same subspace as the t matrix

okay and this

although the nice properties is actually introduced to the primal

right and that's why we call it

the problem we use

at the subspace of the nineteen prior

okay so

if it'll it

well like a avoiding the exclamation the posterior covariance

by you know we can data extreme at the post you means you

but the thing is that if you use this formula is going to encode more

computations because we are dealing with the t

t transpose which is a very big matrix

so there's a reason why we have to introduce another assumptions recon uniform occupancy assumptions

which speed up the computations

okay so to do so

we first of all window a singular value decomposition of t

into t

into u s b u one be a be a single but in a single

but others matrix

okay and then you

is this

side speech is assumed at stft matrix

okay so

one dataset is that you one which is the u one in the previous slide

spend the same subspace t

and then you two

is all together when you one okay then we use this property to simplify this

formulas

right so we can express t transit inverse t into this fall because this

is equal to this right

and then this can be expressed in to this file

okay because of this property

then we can multiply and into this so we have i plus and this okay

it's a i class and is equal to a

and then apply

the matrix inversion lemma

in this from this is what we get

and we apply gains this the are

matrix inversion entity that we used before here we have these

he'll and p right now we can put this p the front

then

have a few when p

so that is that we want to express this thing

on the laugh

in two days

a inverse and i terms

expressed in terms of you two

which is orthogonal be you one or to go an o b g

right

so is the a uniform occupants assumptions

because okay

okay is

i class and

and itself is the diagonal matrix

so if you look into individual elements of this

matrix here what we get is this thing here what we get this and see

divided by i

one class and see

right so that you need vol occupancy assumption says that

for all the doesn't components

the occupancy count divided by one cluster occupancy call is the same for all the

constants right here we do need to know what's of value of what is appropriate

value of all file

what we assume is that this the same of a

would be applied forty percent right

by doing so we have this

into this fall

and if you multiply this if you this is the i-vector extractor on this so

if you multiply this t

in two

we did you to then this to move we can sell

so we end up with this formula for i-vector extraction this is very fast because

a week and pre-computed systems

and this is thus

this is a diagonal matrix right so taking the inverse is

is very simple

right

okay no that's a look at the eer computational complexity

so we have four

comparison of for different the algorithm so we have the baseline i-vector extraction which is

the standard fall

we have the you know we have to do d in the product the of

the

but with these metrics

t c transpose d c

and for all the c components so this is your by c f m square

and

the m u is due to the metric conversions

also in terms of memory cost may have to install but and i t matrix

so this is the c f m

okay so now forty fast baseline we can actually be computed is a t transpose

and story while this computer cost all for this

a c m square

but we will actually we use the complete data cost from this to this

okay and that for all

what was made using the informative prior

without the uniform occupants assumptions

the a computational complexity and memory cost is it could be at the same and

the fast baseline

"'kay"

because we can recompute distance and story

well as for the fast

the proposed method

we have

computational complexity we use stream and the to be a this them

and we can pretty complete distance down to memory so in terms of computational complexity

the proposed

fast meant that is

twelve times faster

then the fast baseline

and had a time faster than the s o baseline

okay so

you know there is to present a shall we talk about

a as of today propagation

we need to post your problem

then i mean yes application of an impostor common so the pasta correct could actually

be computed using the same fast method

a given by these cushion here

using the same informative prior

as well as the uniform corpus assumption i mean this the computational complexity

also

we can actually use this that informative prior

given by d transposed he

into the is that

but be in the em a fixed emissions of the t matrix

okay of course we only use in the is that but in the sense that

we actually

this car but others associated with a prior which

which

allows you i think in the form

okay

experiments the experiment was conducted on the is as i ten x and the fast

come with condition one to nine

we use a gender and then ubm we found two gaussians

we fifty seven dimension mfcc and the ubm is trained on switchboard as i four

or five or six and we use you we use the same the about to

train the t matrix

we do a co-ranks of four hundred

based on the obvious p lda for scoring so our before

passing the p lda we use the dimension i-vector those two hundred using lda

and followed by an angle

and for the p lda we have the art when the speaker factors then we

use a full

race you can go into

more the session but

okay so this table shows the

without so for the baseline

the proposed as that method proposed fast method

so the first rule

so's the eer the second rule is the mean dcf so i'll know if we

compare this

results with this

well we can see that the result is not really much difference so we can

say that

by using implement a project what we use

it does not seem to degrade performance

okay then a if we look at the common condition five

which is a telephone conditions

for the proposed fast make the degradation is actually

about ten percent eer and four point five percent and mindcf

k and t v c across all the night common conditions

the relative degradation is ranging from ten to sixteen percent and

where is you can be a source that you with six seven percent

up to twenty point four percent mindcf

okay so i'm is okay so this is

this is the system that we use

this it's of

white data centre i suppose of the statistics

normalize three the an the occupancy kernel

so we use this as a small vectors

and we'd work pca

right

and then we do what projections of all these test or training utterance

and woman

into the low dimensional subspace

and useful for the p l d a simple

a what you can see that

okay i'll why we do that because

if you look at these formulas

this is the can be seen as a transformation matrix

and this is the input vector

and is the projection of this input vector

into a low dimensional vectors

binary comparing to resolve this we don't fast made but it's the others shows that

by using the t matrix training with the em

in the commission of phone give a better performance

no a

this result shows the comparisons of you matrix

train we do not all be informative problem with standard doesn't prowl

but extremely informative problem

comparing this tool we can see that the proposed as that may to actually give

a slightly better result

okay so in conclusions we introduced two new concept

of already computation i-vectors

the first one is what we call the subspace l optimising pro

and we

the use of subspace modeling probably can about in the to compute the posterior covariance

okay before computing the pasta means

and then we use a uniform workable assumption because read used

the

computed complicity

so we the combined combination use of this to the assumptions and informative prior

we speed of the i-vector extraction process

but i-vector trial we a slight degradation in terms of accuracy

is my have

we have time for a few questions

so it seems useful problem of course

i have so that i so i

this the performance of to me by saying this that's that we notice the same

as baseline is you have access also we what the as that because

exactly as we of the use of the uniform occupants assumptions

by just using the subspace the other than same problem

because we want to see that a by introducing difference that we first introduce the

starts based recogniser brow and informal by a uniform the basic assumptions so want to

see a in t v just

what is the a

what if x

maybe use you know we introduce a subset of the problem

we get a better performance of slightly was performance

Rapid Computation of I-vector

Speaker Recognition: i-vector approaches

Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang