i
and B
i
i this is a probabilistic pca based system for
dictionary learning and encoding for speaker
oh
i'm second i'll go into instructing limit of the bic student
at present my paper isa
because pointed out it's data
oh it's a hybrid factor analysis system
and it uses ppca
oh basically to simplify the compute intensive computation intensive parts of the factor analysis system
as we have seen from the previous
talks
oh the fact that the system is
what competition intensive and a this work is still any about how to simplify some
parts of the system
us so that
oh be again some advantages at the same time but not a having some state
of wanting performance process but
so basically all of the four
and that as we basically explaining why such a
such a thing is possible for so far and what perspective of this factor analysis
system that enables us to simplify
a those parts that i'm gonna talk about especially the hyperparameter estimation technique which is
basically the T matrix in the subspace model
a spastic total variability space
so and the end will be looking at how the performance of the system is
and two is a very modest a representation of the entire vector
spain but
so basically you have supervectors only to fix the damage the representations of speech utterances
a that are converted to a low dimensional
i-vector representations
so this a second time you press
i D W is so basically the representation used in this paper
and i have i'm going to fix the same station
this paper is that
so oh to just
for the sake of completeness i just get i
most of us my most been doing what is
oh there was a light of the system but
but still thank you
the
i just explain what is happening and
then we know the perspective that is very important for the test
and this is that the that is
for from a speech utterance
we consisting of feature vectors at
and basically we use the G a gmm parameters
gmm ubm parameters
to basically for the supervector
and what was once we have a supervector be from the development data we used
to train the
subspace motivated videos is met
and then we try to extract the feature i-vectors that are quite of the sent
data
and get low dimensional representation to presentation
okay
speech utterance
so once we have
a testing phase we try to find acoustic distance between the target speaker and the
brain patterns
and this is how the agenda the general framework of speaker recognition system
and such a system can actually be viewed as
oh consisting of two
and encoding
so here once you are the development data this product development data you like you
oh estimate a subspace in which all these people with the state is the total
variability space and this can be done to speech and it in
and i also wonder and one of that's this paper
this is an overcomplete dictionary
and
so once the subset you matrix is lower
we try to encode the data based on the supervector it has been observed
okay so this is
audio video frame but
that there is used in this paper
so we'll see how decoding these stages of the and rate if the system is
that by the us
the basic motivation behind this is
a variational importance of encoding and importance of decoding this entire system
oh as i in two phases which is the dictionary learning
sure but you
so basically the end of the
but must be done on a sparse encoding procedures better for example if
you take a orthogonal matching pursuit
algorithm a bit okay sparse vectors
and you train a dictionary using those using that algorithm they also that using in
your encoding algorithm
but not work
so they have the that some of the encoding of buttons what better than the
others in any does not is necessary had to be the optimal matching pursuit algorithm
set
so i'll for example this they have that the soft thresholding scheme works better
a way back to the speaker
oh
so yeah i is an opportunity to see if we can
we list a particular a base that is very computationally intensive to explain the observations
made in this work
just to
see this is cool for any improvement in terms of computational
a efficiencies
we look at the union
step i was thinking is taken from
the egg S P
you the e-step we want to
we started images with that them
please
and then accordingly anomalies the
columns
you
you get a i-vectors of development data and then keep the estimating a given okay
oh convergence
so this
this isn't about the and you see
okay is
machine
additionally density and i'll try to formalise it in terms of a big limitation data
so once this is done
i
to look at an alternative if they don't if it is the total variability space
model
which is a problems T V C
the
once is in this one is it is yes introspective in addition
oh they come up with a stick three parts of this estimation
and one of the important ones that they have in that is that is
it just a special case of i
that is a set of them badly covariance matrix is
for example
and one of the main
S but there is that how the computation of the covariance matrix in a probabilistic
we use L
it's less intensive in terms of four
computational complexity when it comes to
a very high dimensional data samples like this product of the P
so a lot of the reasons if it's but
but that's be observed the proposal ppca is not as good as the
oh that of the i-vector techniques
the conventional factor analysis techniques and we'll see how to
oh
complete all the observations that we need to know into a
these systems
so this just use
say i hear the unit and step
i
and a similar i saw analysis case
except a that as so the first a P
kenny mentions that the ppca does not the
necessarily assume that that's what it does come from a gmm
so this that the computation vol
yeah instead
so we can see that
and what steps we can see there's a huge
different
you
less intensive than that of the conventional technique
your C
i
images
ubm
yeah the dimensionality of the feature use
i say
i five
i don't understand percent in this case of development data sequence
we should be allowed
and that's why
they said to be
i
i
i'm you know his own
but what can be done
oh
is the
you say you know
this that these sources
you
is it possible
to consider only i mean you method using the ppca
in
using the conventional thing doesn't give any advantage of these sets of observations
to make it
and
so the proposed approach
and basically
the estimated you made using
here
i
okay
the conventional technique
bad
you make an assumption
that which makes an assumption that the supervector comes from a german
so what happens is
the i-vector that are encoded using
taking
oh
the rest of this
i was going this time constants which is
oh
okay
my presentation
i think
see without so suppose i
ppca estimate of T
the i-vectors are that are estimated so what was interesting here
that if i had to estimate that is using the conventional ppca technique
oh i is that you want to be
what happens is estimated using the
oh proposed approach
is that i think that information
as
speech are basically covariance matrices
and in the middle but that expression access to normalization
which seems to be really useful
which will be seeing in this
so
using
was that experiments on is that it is that
okay
i
the data development dataset used is right
minimalistic then compared to the box
oh
these are
is that
databases
me parts that are missing
there's
i
and the mfcc features extracted
basically the string to cepstral coefficients
and those are numbers in addition
consider the feature extraction is
so that means we're and you matrix
so it is five hundred but with in what's techniques
and the standard hmm based is only that has usually in the in the one
we wccn
and the data that can be
a plate mental image
so the doctors directions are
leading
and just to say support for that you see it's
much faster than the conventional technique
this
the
although system was implemented
oh
yeah systems with that in matters fashion using
and the contents of speech recognition is and that makes
so
if you look at the context of
technique
vol
yeah
we see here for different
the whole that the difference in
many B E citation
and
with the final class
so
we wanted to
take advantage this of
this exactly
oh
so
one C
just
preliminary tests
we see that the i-vectors incorporate
in that way that we have
proposed
are
good enough for
well being used in speaker recognition
so this shows that interspeaker and intraspeaker is
is
i
i
in the performance to ensure that
it is
you
the degradation
yeah
conventional factor analysis system
on this
still
so one aspect of a
this work that is interesting is to find of the relationship between the two i-vectors
that are extracted in two different ways so just to look at the whole data
related with the relationship is linear we need to be used to
correlation analysis
canonical correlation analysis and
applicable
so basically cca is like mutual information greatly usually
estimation of mutual information and that all there is a large population
but when you have a basis so if
if you need to determine that the relationship is nonlinear that is
it is in here in a high dimensional space
all but try to use P C
so what we can see is that the convention and i think and ppca subspace
is not nearly but
that is what and suggest is that what is based on that
and then you look at a yeah
it should be
the i-vectors extracted from the
a conventional approach and the i-vector six extracted
and the proposed approach
the extent of the asian to its
oh basically a full kind of T V though in the space generated by a
point we can
and this is
what is a most interesting aspect
what it
gives is that
is an opportunity to look at different splitting procedures
oh
so that the performance of what systems can be much
so in this
a baseline system forty eight and C six is given
and you see you know
oh i have a problem is
the so that this
six C
is it is
and you can see in two days
what is it that it's a
yeah
or it's a ppca
so if you look at the ppca technique
and the proposed
though there is a clear improvement in terms of the
so in summary
oh the ppca
this is actually which is the total variability space
oh matrix
and in doing so we
oh
speaker for explosives or
and the performance
close to one
point
respect to the
the degradation was just one thirty nine point
yes
and
which that is also a baseline system
and
we are basically ppca system the improvement is
related to point a person
and the i-vectors
system
one important conclusion
i-vectors and the
proposed extracted using the proposed approach
is non-linearly related to
those and the baseline
i
yeah P
this
oh
that was a context as
cost "'cause" use
but you're based sparse structure is always
so you used
oh is the observation was that
i don't know the reason why the decoding of the
dictionary learning and encoding part and me is there is there is only observation right
number i digital data stream
and then some
so i'm not
oh
and the speaker again