i

and B

i

i this is a probabilistic pca based system for

dictionary learning and encoding for speaker

oh

i'm second i'll go into instructing limit of the bic student

at present my paper isa

because pointed out it's data

oh it's a hybrid factor analysis system

and it uses ppca

oh basically to simplify the compute intensive computation intensive parts of the factor analysis system

as we have seen from the previous

talks

oh the fact that the system is

what competition intensive and a this work is still any about how to simplify some

parts of the system

us so that

oh be again some advantages at the same time but not a having some state

of wanting performance process but

so basically all of the four

and that as we basically explaining why such a

such a thing is possible for so far and what perspective of this factor analysis

system that enables us to simplify

a those parts that i'm gonna talk about especially the hyperparameter estimation technique which is

basically the T matrix in the subspace model

a spastic total variability space

so and the end will be looking at how the performance of the system is

and two is a very modest a representation of the entire vector

spain but

so basically you have supervectors only to fix the damage the representations of speech utterances

a that are converted to a low dimensional

i-vector representations

so this a second time you press

i D W is so basically the representation used in this paper

and i have i'm going to fix the same station

this paper is that

so oh to just

for the sake of completeness i just get i

most of us my most been doing what is

oh there was a light of the system but

but still thank you

the

i just explain what is happening and

then we know the perspective that is very important for the test

and this is that the that is

for from a speech utterance

we consisting of feature vectors at

and basically we use the G a gmm parameters

gmm ubm parameters

to basically for the supervector

and what was once we have a supervector be from the development data we used

to train the

subspace motivated videos is met

and then we try to extract the feature i-vectors that are quite of the sent

data

and get low dimensional representation to presentation

okay

speech utterance

so once we have

a testing phase we try to find acoustic distance between the target speaker and the

brain patterns

and this is how the agenda the general framework of speaker recognition system

and such a system can actually be viewed as

oh consisting of two

and encoding

so here once you are the development data this product development data you like you

oh estimate a subspace in which all these people with the state is the total

variability space and this can be done to speech and it in

and i also wonder and one of that's this paper

this is an overcomplete dictionary

and

so once the subset you matrix is lower

we try to encode the data based on the supervector it has been observed

okay so this is

audio video frame but

that there is used in this paper

so we'll see how decoding these stages of the and rate if the system is

that by the us

the basic motivation behind this is

a variational importance of encoding and importance of decoding this entire system

oh as i in two phases which is the dictionary learning

sure but you

so basically the end of the

but must be done on a sparse encoding procedures better for example if

you take a orthogonal matching pursuit

algorithm a bit okay sparse vectors

and you train a dictionary using those using that algorithm they also that using in

your encoding algorithm

but not work

so they have the that some of the encoding of buttons what better than the

others in any does not is necessary had to be the optimal matching pursuit algorithm

set

so i'll for example this they have that the soft thresholding scheme works better

a way back to the speaker

oh

so yeah i is an opportunity to see if we can

we list a particular a base that is very computationally intensive to explain the observations

made in this work

just to

see this is cool for any improvement in terms of computational

a efficiencies

we look at the union

step i was thinking is taken from

the egg S P

you the e-step we want to

we started images with that them

please

and then accordingly anomalies the

columns

you

you get a i-vectors of development data and then keep the estimating a given okay

oh convergence

so this

this isn't about the and you see

okay is

machine

additionally density and i'll try to formalise it in terms of a big limitation data

so once this is done

i

to look at an alternative if they don't if it is the total variability space

model

which is a problems T V C

the

once is in this one is it is yes introspective in addition

oh they come up with a stick three parts of this estimation

and one of the important ones that they have in that is that is

it just a special case of i

that is a set of them badly covariance matrix is

for example

and one of the main

S but there is that how the computation of the covariance matrix in a probabilistic

we use L

it's less intensive in terms of four

computational complexity when it comes to

a very high dimensional data samples like this product of the P

so a lot of the reasons if it's but

but that's be observed the proposal ppca is not as good as the

oh that of the i-vector techniques

the conventional factor analysis techniques and we'll see how to

oh

complete all the observations that we need to know into a

these systems

so this just use

say i hear the unit and step

i

and a similar i saw analysis case

except a that as so the first a P

kenny mentions that the ppca does not the

necessarily assume that that's what it does come from a gmm

so this that the computation vol

yeah instead

so we can see that

and what steps we can see there's a huge

different

you

less intensive than that of the conventional technique

your C

i

images

ubm

yeah the dimensionality of the feature use

i say

i five

i don't understand percent in this case of development data sequence

we should be allowed

and that's why

they said to be

i

i

i'm you know his own

but what can be done

oh

is the

you say you know

this that these sources

you

is it possible

to consider only i mean you method using the ppca

in

using the conventional thing doesn't give any advantage of these sets of observations

to make it

and

so the proposed approach

and basically

the estimated you made using

here

i

okay

the conventional technique

bad

you make an assumption

that which makes an assumption that the supervector comes from a german

so what happens is

the i-vector that are encoded using

taking

oh

the rest of this

i was going this time constants which is

oh

okay

my presentation

i think

see without so suppose i

ppca estimate of T

the i-vectors are that are estimated so what was interesting here

that if i had to estimate that is using the conventional ppca technique

oh i is that you want to be

what happens is estimated using the

oh proposed approach

is that i think that information

as

speech are basically covariance matrices

and in the middle but that expression access to normalization

which seems to be really useful

which will be seeing in this

so

using

was that experiments on is that it is that

okay

i

the data development dataset used is right

minimalistic then compared to the box

oh

these are

is that

databases

me parts that are missing

there's

i

and the mfcc features extracted

basically the string to cepstral coefficients

and those are numbers in addition

consider the feature extraction is

so that means we're and you matrix

so it is five hundred but with in what's techniques

and the standard hmm based is only that has usually in the in the one

we wccn

and the data that can be

a plate mental image

so the doctors directions are

leading

and just to say support for that you see it's

much faster than the conventional technique

this

the

although system was implemented

oh

yeah systems with that in matters fashion using

and the contents of speech recognition is and that makes

so

if you look at the context of

technique

vol

yeah

we see here for different

the whole that the difference in

many B E citation

and

with the final class

so

we wanted to

take advantage this of

this exactly

oh

so

one C

just

preliminary tests

we see that the i-vectors incorporate

in that way that we have

proposed

are

good enough for

well being used in speaker recognition

so this shows that interspeaker and intraspeaker is

is

i

i

in the performance to ensure that

it is

you

the degradation

yeah

conventional factor analysis system

on this

still

so one aspect of a

this work that is interesting is to find of the relationship between the two i-vectors

that are extracted in two different ways so just to look at the whole data

related with the relationship is linear we need to be used to

correlation analysis

canonical correlation analysis and

applicable

so basically cca is like mutual information greatly usually

estimation of mutual information and that all there is a large population

but when you have a basis so if

if you need to determine that the relationship is nonlinear that is

it is in here in a high dimensional space

all but try to use P C

so what we can see is that the convention and i think and ppca subspace

is not nearly but

that is what and suggest is that what is based on that

and then you look at a yeah

it should be

the i-vectors extracted from the

a conventional approach and the i-vector six extracted

and the proposed approach

the extent of the asian to its

oh basically a full kind of T V though in the space generated by a

point we can

and this is

what is a most interesting aspect

what it

gives is that

is an opportunity to look at different splitting procedures

oh

so that the performance of what systems can be much

so in this

a baseline system forty eight and C six is given

and you see you know

oh i have a problem is

the so that this

six C

is it is

and you can see in two days

what is it that it's a

yeah

or it's a ppca

so if you look at the ppca technique

and the proposed

though there is a clear improvement in terms of the

so in summary

oh the ppca

this is actually which is the total variability space

oh matrix

and in doing so we

oh

speaker for explosives or

and the performance

close to one

point

respect to the

the degradation was just one thirty nine point

yes

and

which that is also a baseline system

and

we are basically ppca system the improvement is

related to point a person

and the i-vectors

system

one important conclusion

i-vectors and the

proposed extracted using the proposed approach

is non-linearly related to

those and the baseline

i

yeah P

this

oh

that was a context as

cost "'cause" use

but you're based sparse structure is always

so you used

oh is the observation was that

i don't know the reason why the decoding of the

dictionary learning and encoding part and me is there is there is only observation right

number i digital data stream

and then some

so i'm not

oh

and the speaker again