mission
i'm not sure must be from but
and we will have a five
papers
first one
will the was that incorporation of eer for a start but
score variance spectra based normalization for i-vector standard probabilistic linear discriminant analysis
the authors are okay started
if we show skit domain is it is possible
not possible task
also you are the last one i can problems
so
present paper
yes thinking on that
so that in the past on that just mention needs a collaborative work so actually
it's also offer a lot because
the
right
the work has been started with condition scale and because you don't invest in the
speech and so i wanna start with some analysis of for what we did before
and also try to improve the work that has been previously
so
this is based on i think so and kcca welcome back to i-vectors
so i would start with a brief description of a system and which is based
on classical i-vector in i
yeah
a tall muscular the post processing of the i-vectors beef between the i-vector extraction the
plp
which is the buttons a system where we try to improve the discriminant see so
usually by using a D approaches
and also to compensate for the session variability so one way to do it is
to use the length normalization there are plenty of way to do this but i
will focus on these two
and as the discriminant C is a related to the variance
the data are and we look at
in the between and within class variability
so
we start with the description of the system so that
between on one for
so the system is just a classical ubm
everything is gender dependent from the beginning to the end
so the idea is to some distribution
the we extract mfcc sixty dimensions of the use the based on the use recognizer
and the constraint is the very classical so using a large amount of data for
based on four or five or six
and wait
so for the second pass the i-vector extractor also gender dependent and
we only telephone data from these four or five was the switchboard future
i think it's quite the state of the art
so just a rough idea of the number of sessions
and for the i would say that a normalization and classification training which includes both
the gplda training
and you training and everything will see in the following
we used a gender dependent subsets of the various sets of data
based on still for five or six and sweet spot and we use only is
because of the number of sessions yeah
and the we restrain the development set to segments for which the nominal X
is higher than one hundred eighty seconds
so no look at some tools that can be useful when we talk about variability
so first i would just remind discriminant C and covariances so
we
a commonly used the covariance matrices of the total covariance the between class covariance the
within class covariance
but usually it's very i mean it's very common speaker verification to instead of using
the between and within class covariance matrices to use the scatter
matrices
so the definition is roughly similar and so they can use
is that one of the ozark and for several applications
the recent chapter
is that
i don't the scatter matrices the do not take into account
the number of sessions per speaker so the weight actually a speaker is that the
one of the pounding of the number of sessions
so i think it's a commonly used look at we just need a few experiments
distance to see
in our system
one of the other it's much efficient
so what talking about classification what we are interested in is to
read use
the maximise the between
speaker variability and reduce the within speaker variability
and one way to do this is to look at the covariance
and so what we need to do
this to which is a spectrum and so too is very common to
yeah of the raw
the main
what is it so on this graph of we can see three plots which are
coming from the top of any to the violence
is that science and within class variance of for us so the speaker and session
so what we compute the between class covariance matrix
B
then we rotate all the data on the development set in the i-vector basis
can be
we compute then dimensions
and then we just but the diagonal of this matrix so you can see that
the variability
in the first dimensions is higher for the speaker and also for the sessions
so now talking about this way to maximize this ratio is to use the very
common lda someone is just maximizing the rayleigh coefficient
so there is completely defined is really coefficient using the within and between-class covariance matrices
or using the scatter matrix
so in this work the it would be used to reduce an exercise from six
hundred to eight so this is constant for all the experiments we have
and the to go is that it system description
we try to define scoring the first one is based on the two covariance model
that has been used by need to two years ago we can write
and so
shen
and the second one
is based on the period using the gaussian assumption
that you were used is based on so we used the eigenchannel matrix of the
key
but the full range because on television this time was using the diagonal see
so the number of speaker factors in the key thing but i mean at to
be consistent with the lda
and the number of channel factors six something because it's the way to
compensate for the diagonal
so that the problem is all this
students including to model programs and here is that everything is based on the questioned
assumption and
for those working you know that's two D C
we have very good to know that we are talking about it at the T
and the noise very company
not in the community that the i-vector are not following the nice motion but something
a bit more that you like
they didn't
distribution
so what we do is that we try to take all decided these i-vectors and
make that make the distribution motion
in one way to do this just been proposed initially by to present the same
time
is so then i guess they're male and that's the speech intention
is to normalize the magnitude of i-vectors
so using this formula as this one and just the
we centered at thousand that we just normalize them into
so using this method the distribution the car become a bit more cushion
and we can see that the effect is
very efficient
so just using the tool to
but
two covariance model
we can see that again in both equal error rate
and this form at mit on nist two thousand and so this
and instances two thousand and extending
is a simple presentation
so everything until now is very common so going back to the to the to
introduce previously
oh we would like to show the effect of length normalization
provides a by a spectrum
and as you can clearly see
a det curves a exactly the same except for the rest of the value was
because a normalizing the magnitude
we can see that
the button on the right side are smaller but it doesn't affect me much just
so
fortunately
an initial papers the maximization as we introduce with
whitening so it has to be done after whitening of the data
so that they got several in this in this algorithm so the whitening is just
using the total covariance matrix you know when i vectors and then we apply the
length normalization
at the same time initial risque introduce the eigenvector measure which is just a whitening
plus like summarisation but don't iterative
and by this iteratively the interest of this method is that for
converge very fast
and we introduce some properties
that we can use further
so the properties out that the need of the development set is a converging to
zero very fast
the covariance matrix the total covariance matrix is become the identity you five i
and going from this all the eigen vectors for the from the
between class covariance matrix
because also eigen vectors of the within class covariance matrix
and thus using all this property together
it happens that the eigen vectors of the
between and so within class covariance matrices
now solution of the
and the optimization
that means after all this
it at and the eight yeah improvement is
so that was one of the conclusion junctions
first paper
and the that we can see the effect of the this normalization of the on
the variance spectra
so before we a treatment i-vector based on this
provide
and after one
after one iteration which is exactly what the former romero
proposed
and so what i think the signal
we can see that the total covariance spectra become a flat
after two iterations
even better
and after three
almost perfect at least for the human eye
so you can see that
the big advantage of this paper is that the first dimensions
data does not contain the major portion
the variability
there might a portion of the session variability
so what actually
after this treatment the i-vectors become
optimal for the weighting coefficient optimization that means this should be the
optimization of at
so to illustrate this some results using the lda then we use the two covariance
model for score
and
so the baseline is just the length normalization when i say length normalization
without any whitening
is just the magnitude normalization
so you can see that using the
and eigenvector original doesn't improve
the performance after one iteration
if we use the scatter matrices to compute the U
but in the case we compute the
the at you using the between and within class covariance
we can see that for the female at least it improves the performance
and after two iterations
we can see that the conclusion is the same means using this data
the between and within class covariance matrices
he seems a not optimal so it's better to inspectors use the between class covariance
the initial definition
so that after this result we try to apply the same data to
before the P which is more robust maybe the covariance model
so
this is the baseline using only length normalization and when we apply two iterations eigenvector
original which is optimal in this case
that we see that the data is not adapted for the key idea
so the performance on the bizarre
might states even worse
but
there
so it was a extending this work a by still looking at the covariance is
but
thinking that after the length normalization everything is on the sphere so that means we
have a spherical surface and what it does not like this
and is very difficult to estimate the covariance matrix
because when you look at each speaker
from one side of this field one also that the within speaker variability we
very different
and if we just take the average of this
to estimate the development set within class covariance matrix
then it doesn't make sense anymore because the them at the
metrics negative for some speaker but obviously not for
so what was in this paper is that keeping the detectors on the surface because
no it's commonly admitted that is
really to use t-normalization for the session compensation
but we want to be the principal directions for the decision boundaries
that means
we won't us within class covariance matrix to become
diagonal and even better if it's the just the i don't teams about
a constant
so we decided to apply exactly the same algorithm as previously
an iterative process which is using the same process instead that we replace
the performance metrics
by the within class commencement
and so by doing this
we can see on the spectral of the same set of development that one iteration
make them
the set become very fast so this is the session but we can see that
it's almost what spread
oh the dimensions
and the after two iterations
for all or from the point of view fume and still exactly the same but
in the rate that you the performance
so that weak emission we can see that it's completely flat and what's the effect
so this
when we use them to us so that's why i'm gonna show in a few
minutes
but before that i just want to identify the this process can also be used
to initialize the key here matrices
actually
for most of us are using a pc in order to them
to initialize the key idea matrices because
provide the first point
the first information space
which we can reproduce so that's a very good starting point
and but actually what we propose here is to use this process so we what
they all the i-vectors the eigenvectors basis of B
and then we initialize the was that this is the speech in the speaker factor
matrices matrix
we each we initialize by using the first ten dollars
the distance
then for the eigenchannel metrics we use the
to rescue decomposition of the brain
the within class commencement
actually if you
if you can see that actually using you wanted to think the eigenchannel matrix
you can just initialize the signal using the same process works
i think
so that some results using the so we don't using before it's just detectors plus
the normalization process
and
so i just want to mention that
for the random initialization of the pac as the performance can vary it
depending on the initialization point
we performed and experiments with different physician and then we may be averaged the results
so you can see the baseline that i previously presented and also the eigenvector method
which is not efficient this case
and you can see that using the spherical normalization
how we call this
you normalization
performance
so improve in the case
so
no
the say the C station
process that i just described we can see that the performance of data
but i just want to that the fact that performance on the best are actually
it's just the fact that
in this case going towards for
the performance when using this physician are just the lower bound
of what we obtained by using mandarin physician
so that means it's a it's maybe better but i guarantee a certain of and
performance
so
to conclude this presentation i just want to new
and for that the fact that we used
so i didn't do this to the band spectra which is very well known be
non-separable so that
use that used in the presentation may be a few
use it
but this tool was to analyze the performance of the system and actually can also
be used to
what i'm thinking after obtaining the two i-vectors
it's a very good indicator of the quite
what
extractor
because just looking at the spectral you can have a rough idea of the performance
we get that yeah
and i think iteration is doing some experiments at this time and you will present
this
in this thesis i think very soon or he doesn't
so i
this would have to be useful for analysis proposed
so for the case we shoot
coming back to our previous paper we show that the rating process
the normalization whitening
to improve the performance slightly so it's not that the improvement
why not doing it twice and it's three
and
also that the co-occurrence matrices
i think you know case perform better than using the scatter matrix
then to and this talk just remember that the spherical nuisance normalization
in the in the middle
improve the performance of in the case of
you scoring
and also that
something in mentioned before but when you use the this type of process to initialize
you didn't matrices
the and the you don't need to perform
so yeah em iterations
so for the case i presently we obtained the best performance but
using hundred iterations of yeah
in case of problem can see section
using this process we just need to make ten iterations
so if the key is not the requesting them
training
in some ways to reduce the time
so no if you and question
yeah
oh
the
oh
i
i
oh
i
i
i
yeah i actually if we get really i don't like the length normalization because it's
three a
and only not process which is going just right now so apps of justly but
and i think we need to find a way we address this issue
by finding something more
consistent
you
yeah i
and you
i