thanks project but i introductions
and graph and all that it but the them going to present its adjoint what
my a student's t
it's wise the are we hi joanne prof young from nineteen you still pose and
try to train a
so put into the right context we called it to a post present about one
way and in central
is on the use of i-vectors in the lda
so in this paper stand alone to present but the intention is to we use
the computations
in i-vector extraction so we call repeat competition i-vectors
"'kay" for going to detail is let me as bank of a slight
to we send the background and so as the motivations of the work
so and i-vectors extraction process can be seen as a compression process
right maybe you compress
across the crime
and the supervector space
the optimal which is a low and fixed dimensional vector speech recall i-vectors which can
see this
not only the speaker information is but we have the characteristics of the recording devices
the microphones to use
the transmission channel characteristics which including the ankle is made that we use
in transmission
for this transmitted of the speech signals that as well as the cost experiments
so
point two would be a mathematical form this is the i-vector
this is i-vectors and i-vectors
is the mlp x timit of the
latent variables
and
if you see here we have a single latent variable which is high cross
gaussian
and it i of course of frames so tying across frames and also is the
one that gives us that compressions process
compressed "'cause" a time in this but with the space
so
we assume that we know the alignment of frames to gaussian
and in the actual implementations this year of a frame alignment of gaussians
could be you love ideally what the gmm pasta you
all
most of is only used a single posteriors i
so no if we look at this latent variables
we
there is the assumption that the
trial
of this late in trouble is the standard gaussian distributions to be zero mean and
unit variance
so even the observation sequence
we could x t makes the post you which is and that of gaussians
we main five and covariance are inverse
of course this five
will be applied the speech is the posterior means of the latent variable x
and
one can see i-vectors is italy about it was the covariance the pot over t
matrix c
think mars is the colour matrix of the ubm
and f is the centroids first order statistics
and
l inverse which is the post your covariance is under determined by the
joe the statistics
so one point or not is that
in order to compute what extent the i-vectors
we have to compute
the posterior covariance
because this is part of the questions
okay
we cannot in this paper reviews what we called up you want the statistics
where we want to do is to be active speech this task in the house
and it's open to t and f similar here
so this sector simplified equations
we ought having the stick my speaker
okay so now the we have only one
objective in this paper that is really of the computations complexity of i-vector extraction
while keeping a memory common the low
and which like all perhaps not degradations on the performance
okay so why it is important because
is important because implementations of a very fast
exclamation i-vectors could be
before on hand held devices
all for that scale how based applications where a single server may have to
receive request
from hundred or one thousand quite some kind of the same time
okay and
also we reason we also recently we have you know increasing
the numbers of gaussian w is a system for example in the people there is
going to present coming
sections
see
number one thousand which process ten thousand so direct computation would be
something while for these
scenario
okay and
i know whatever estimation is that the
the and i think is on the right precomputation i-vectors
rather conservative exclamation t matrix because t matrix is extreme at once and usually
offline
and we can use a huge amount of computation resources
they can use fixed but
okay so
yes the
problem statement
the computation of alternate of i-vector extractions
lights as the exclamations of the posterior means
requires us to
extreme at first the post your covariance
so are they are
couples of existing solutions to solve this problem
and
including the eigen decomposition method also covariance model but we
fix compose account by a guy
factors subspace
by up a little
and we also on the sparse coding to improve the you know a simplified
the most your cover estimations
so in this paper what we propose is to
complexity may rightly the posterior means be up and it to evade it will still
covariance
so we did this by doing a first one we call to use an informative
prior
which are going to shows later
and the uniform occupancy assumptions are still with the commission this tool
we can do a fuss extreme i-vectors
of course without the need to estimate the posterior covariance
okay so
in the combination of all
the
i-vector extraction we issue a standard doesn't profile
and
no if we can see those
involvement for all
we
mean given by new p and the core and you must marquee then i-vector extractions
is given by this regions where we have to an additional terms here
people determines by the
cover the prior
and this new mike
so no if we consider the case where this like with the zero this cycle
demanded a matrix then distance will disappear
and is only go to the i didn't matrix so we did use to the
standard form
so in this paper we propose to use this
well for informative problem
where the means to zero but the
but over in this young by this
t is the total where t matrix still we have the inner product
of that order bitexts of and in bus
to be a book file
so okay now i've able to reduce i think
so what is that we in the i-vector second formulas we have additional terms you
about the problem right so now if you plot is into this i-vector extraction from
will then we'll when the get this right so we can always share that it
transpose t there is a inverse because we can this always full rank
i given that the assumption of training data
then we could take this t l
from
no and again this in both then we'll get
right
and then us these matrix inversion identity which
i copied from the matrix a global
okay so like the idea guys of you have a matrix p and q and
p here we construct the although something
p and q by putting this in the front right
so if you look at this formula speech is the same as
this one
right so we can say this is the p is it's a key when it's
the pa then we can put this
for what
and then sort of these right so no if you do and this formulas write
this is the linear algebra this is a projection matrix right approaches in matrix is
you know you can buy in this fall what you want you to a although
than a matrix meaning that
each column of this
you want
is a
all the love each other columns
and there is a unique now
and you wanna spend the same subspace as the t matrix
okay and this
although the nice properties is actually introduced to the primal
right and that's why we call it
the problem we use
at the subspace of the nineteen prior
okay so
if it'll it
well like a avoiding the exclamation the posterior covariance
by you know we can data extreme at the post you means you
but the thing is that if you use this formula is going to encode more
computations because we are dealing with the t
t transpose which is a very big matrix
so there's a reason why we have to introduce another assumptions recon uniform occupancy assumptions
which speed up the computations
okay so to do so
we first of all window a singular value decomposition of t
into t
into u s b u one be a be a single but in a single
but others matrix
okay and then you
is this
side speech is assumed at stft matrix
okay so
one dataset is that you one which is the u one in the previous slide
is
spend the same subspace t
and then you two
is all together when you one okay then we use this property to simplify this
formulas
right so we can express t transit inverse t into this fall because this
is equal to this right
and then this can be expressed in to this file
okay because of this property
then we can multiply and into this so we have i plus and this okay
next
it's a i class and is equal to a
and then apply
the matrix inversion lemma
in this from this is what we get
and we apply gains this the are
matrix inversion entity that we used before here we have these
p
he'll and p right now we can put this p the front
then
have a few when p
so that is that we want to express this thing
on the laugh
in two days
a inverse and i terms
expressed in terms of you two
which is orthogonal be you one or to go an o b g
right
so is the a uniform occupants assumptions
because okay
okay is
i class and
and itself is the diagonal matrix
so if you look into individual elements of this
matrix here what we get is this thing here what we get this and see
divided by i
one class and see
right so that you need vol occupancy assumption says that
for all the doesn't components
the occupancy count divided by one cluster occupancy call is the same for all the
constants right here we do need to know what's of value of what is appropriate
value of all file
what we assume is that this the same of a
would be applied forty percent right
so
by doing so we have this
into this fall
and if you multiply this if you this is the i-vector extractor on this so
if you multiply this t
in two
we did you to then this to move we can sell
so we end up with this formula for i-vector extraction this is very fast because
a week and pre-computed systems
and this is thus
this is a diagonal matrix right so taking the inverse is
is very simple
right
okay no that's a look at the eer computational complexity
so we have four
comparison of for different the algorithm so we have the baseline i-vector extraction which is
the standard fall
we have the you know we have to do d in the product the of
the
but with these metrics
t c transpose d c
and for all the c components so this is your by c f m square
and
the m u is due to the metric conversions
also in terms of memory cost may have to install but and i t matrix
so this is the c f m
okay so now forty fast baseline we can actually be computed is a t transpose
t
and story while this computer cost all for this
a c m square
but we will actually we use the complete data cost from this to this
okay and that for all
what was made using the informative prior
without the uniform occupants assumptions
the a computational complexity and memory cost is it could be at the same and
the fast baseline
"'kay"
because we can recompute distance and story
well as for the fast
the proposed method
we have
computational complexity we use stream and the to be a this them
and we can pretty complete distance down to memory so in terms of computational complexity
the proposed
fast meant that is
twelve times faster
then the fast baseline
and had a time faster than the s o baseline
okay so
you know there is to present a shall we talk about
a as of today propagation
we need to post your problem
then i mean yes application of an impostor common so the pasta correct could actually
be computed using the same fast method
a given by these cushion here
using the same informative prior
as well as the uniform corpus assumption i mean this the computational complexity
also
we can actually use this that informative prior
given by d transposed he
into the is that
but be in the em a fixed emissions of the t matrix
okay of course we only use in the is that but in the sense that
we actually
this car but others associated with a prior which
which
allows you i think in the form
okay
experiments the experiment was conducted on the is as i ten x and the fast
come with condition one to nine
we use a gender and then ubm we found two gaussians
we fifty seven dimension mfcc and the ubm is trained on switchboard as i four
or five or six and we use you we use the same the about to
train the t matrix
we do a co-ranks of four hundred
based on the obvious p lda for scoring so our before
passing the p lda we use the dimension i-vector those two hundred using lda
and followed by an angle
and for the p lda we have the art when the speaker factors then we
use a full
race you can go into
more the session but
okay so this table shows the
without so for the baseline
the proposed as that method proposed fast method
so the first rule
so's the eer the second rule is the mean dcf so i'll know if we
compare this
results with this
well we can see that the result is not really much difference so we can
say that
by using implement a project what we use
it does not seem to degrade performance
okay then a if we look at the common condition five
which is a telephone conditions
for the proposed fast make the degradation is actually
about ten percent eer and four point five percent and mindcf
k and t v c across all the night common conditions
the relative degradation is ranging from ten to sixteen percent and
where is you can be a source that you with six seven percent
up to twenty point four percent mindcf
okay so i'm is okay so this is
this is the system that we use
this it's of
white data centre i suppose of the statistics
normalize three the an the occupancy kernel
so we use this as a small vectors
and we'd work pca
right
and then we do what projections of all these test or training utterance
and woman
into the low dimensional subspace
and useful for the p l d a simple
so
a what you can see that
okay i'll why we do that because
if you look at these formulas
this is the can be seen as a transformation matrix
and this is the input vector
and is the projection of this input vector
into a low dimensional vectors
binary comparing to resolve this we don't fast made but it's the others shows that
by using the t matrix training with the em
in the commission of phone give a better performance
no a
this result shows the comparisons of you matrix
train we do not all be informative problem with standard doesn't prowl
but extremely informative problem
so
comparing this tool we can see that the proposed as that may to actually give
a slightly better result
okay so in conclusions we introduced two new concept
of already computation i-vectors
the first one is what we call the subspace l optimising pro
and we
the use of subspace modeling probably can about in the to compute the posterior covariance
okay before computing the pasta means
and then we use a uniform workable assumption because read used
the
computed complicity
so we the combined combination use of this to the assumptions and informative prior
we speed of the i-vector extraction process
but i-vector trial we a slight degradation in terms of accuracy
is my have
we have time for a few questions
so it seems useful problem of course
i have so that i so i
i
this the performance of to me by saying this that's that we notice the same
as baseline is you have access also we what the as that because
exactly as we of the use of the uniform occupants assumptions
by just using the subspace the other than same problem
because we want to see that a by introducing difference that we first introduce the
starts based recogniser brow and informal by a uniform the basic assumptions so want to
see a in t v just
what is the a
what if x
maybe use you know we introduce a subset of the problem
we get a better performance of slightly was performance